We all know it. You search for an issue, or topic you’re interested in, click a few links and boom. Dead end. The page no longer lives there, the domain is gone, or the server ended up at the bottom of a river. Even my website is no exception.
While hypertext documents shouldn’t change, we all know they can, and will do so often. Which is why we have such interest int tools like archive.org and the Wayback Machine. These tools regularly scrape, or have users submit interesting material for archiving. It’s frequently used to ensure a particular version of a page or site is preserved.
I started thinking about this because I read an article about strategies for linking to obsolete websites (thanks Beko Pharm). One was to use a periodic link checker to find stale or broken links on your site. Optionally swapping out outdated references with fresh ones, or with links into the Wayback Machine. While this is all well and good, I think it might be more useful to self-archive sites. Use something like
wget to pull down the document and associated resources and host it yourself (statically), or at least provide an archive for people to download and inspect.
Has anyone given this any further thought? It doesn’t sound like a technically complicated project, but I’m sure someone has already trodden down this path and came to some sort of outcome or reason it’s not worth it.