During my programming work I had to do some system administration tasks, and since a while I'm also maintaining some servers. This is a log for the problems I find - and hopefully their solutions.

Thursday, September 29, 2005

Simple, automated server link check

It's been ages since I've been looking for a solution to be able to check dead links on my sites. Just it never had a high priority - until now.

After some googling, installing, playing around, removing, googling, installing, etc... I've found the perfect script for me: CheckBot. The documentation is pretty much straightforward, but for a reference, I use this setup:

checkbot --verbose --cookies --file index.html --mailto my@addre.ss --dontwarn "(301|302|903|904|400|403)" --ignore "(feeds\.archive|jigsaw|https)" --sleep 0.2 http://www.site1.com http://www.site2.com

--verbose: this should be turned off for automatic running

--file index.html: that's because the report's in its own directory

--mailto: my@addre.ss: so that I see what was going on

--dontwarn "(301|302|903|904|400|403)": I pretty much only need the serious problems

--ignore "(feeds\.archive|jigsaw|https)": URLs with these strings I don't want to check, validators in the beginning, linked with another URL from every page, and I don't need https checking neither (I know the regexps could be better, but that was fine for me)

--sleep 0.2: Without this I experienced a sudden 1.2 load on the webserver, that's not nice. 0.2 seconds of delay between requests it OK for me - local files are still fast, and for remote files the network delay will be much longer anyway

And then the URLs at the end. Can be put to a crontab to run every week, and that's it.

0 Comments:

Post a Comment

<< Home