I recently needed a link checker to create a csv formatted list of all links (especially hosted pdfs) on a client site.
There is a tool called webcheck by arthur de jong which does a great job of checking all of the links on a website and creating a pretty html report.
This got me most of the way there, I could see that in the output there was a page dedicated to a list of every url that was encountered during the search, which looked like what I wanted but was formatted as html
I wrote a small file which will use webcheck’s own code to read in its stored .dat file and write all of the links to a csv file with the format:
1
|
|
Where path
is the url, extension
is the url ending (for example .pdf
,
.html
, ..), internal
is a boolean True
or False
if the link is an
internal link and errors
is the error (for example 404
, ..) if any for
that link.