Simple, automated server link check
It's been ages since I've been looking for a solution to be able to check dead links on my sites. Just it never had a high priority - until now.
After some googling, installing, playing around, removing, googling, installing, etc... I've found the perfect script for me: CheckBot. The documentation is pretty much straightforward, but for a reference, I use this setup:
checkbot --verbose --cookies --file index.html --mailto my@addre.ss --dontwarn "(301|302|903|904|400|403)" --ignore "(feeds\.archive|jigsaw|https)" --sleep 0.2 http://www.site1.com http://www.site2.com
--verbose: this should be turned off for automatic running
--file index.html: that's because the report's in its own directory
--mailto: my@addre.ss: so that I see what was going on
--dontwarn "(301|302|903|904|400|403)": I pretty much only need the serious problems
--ignore "(feeds\.archive|jigsaw|https)": URLs with these strings I don't want to check, validators in the beginning, linked with another URL from every page, and I don't need https checking neither (I know the regexps could be better, but that was fine for me)
--sleep 0.2: Without this I experienced a sudden 1.2 load on the webserver, that's not nice. 0.2 seconds of delay between requests it OK for me - local files are still fast, and for remote files the network delay will be much longer anyway
And then the URLs at the end. Can be put to a crontab to run every week, and that's it.

0 Comments:
Post a Comment
<< Home