I’ve redone the permalink structure on this site. It’s not something I really want to do often — they’re called permalinks for a reason — but it was time to clean it up. Redirects are in place to keep old links working. Continue reading
I’ve redone the permalink structure on this site. It’s not something I really want to do often — they’re called permalinks for a reason — but it was time to clean it up. Redirects are in place to keep old links working. Continue reading
I use the Broken Link Checker plugin on this blog and on Speed Force to find broken or moved links. In addition to helping you manage them in the admin interface, it can also assign formatting (as a CSS class) to mark them in your posts.
Cool! Readers can see that the link is broken before clicking on it!
But what’s the best way to label the links?
The plugin uses strike-through by default. You are marking something that’s gone, but strike-through usually means the text is being crossed out. That’s fine for a link in a list, but something like “Catering was provided by MyNiftyFoodCo” implies that the name of the company is wrong, not that the website is gone.
Just making something italic or changing the color doesn’t work either, because it’s arbitrary. Nothing about an italic link (which could be a title), or a random other color, suggests that something might be missing.
What I’ve come up with is to reduce the contrast on broken links. It combines two familiar schemes:
So here, I’ve got bright blue for new links, darker blue for visited links, and broken links as black (well, very dark gray), the same color as surrounding text. I’m keeping the underline in place so there’s still some indication that it’s a link, but it’s not as strong as the label for one that’s still functional.
It’s still not ideal, since color is the only difference, but it should cause less confusion than the strike-through.
All the things I’d planned or wanted to do tonight, and what do I end up staying up past midnight on? Cleaning out dead links. #
While checking some dead links in the Internet Archive, I decided to see what they had of the website for the Literary Guild at UCI. This was a creative writing club we were both involved in back in college. There’s an abbreviated history of the club still online.
I looked at the earliest archived copy I could find, and noticed down in the corner a badge for a long-forgotten website contest. Every quarter, the UCI Bookstore holds a literary contest, sometimes poetry, sometimes short stories. In spring 1996, they decided to make it a website contest. I had just built a website for the club, and submitted it. Our site was one of the three winners [archive.org].*
Just for kicks, I decided to see which of the sites were still around.
1 out of 3. And even that one’s at a different location.
And so the link rot continues…
* I was hoping to link to an independent announcement, but the UCI Bookstore website only lists the most recent winners (Spring 2007), and while the Anteater Weekly regularly announced the winners, their archives only go back to 1997. I did find the announcement in the May 30, 1996 Zotmail Archive, but it doesn’t return linkable results, so you’ll have to search for it.
Every Friday, a script verifies all the links on this website. I usually check the results that evening, or sometimes during the day at work, and see which dead links I can fix.
Strangely enough, this week 3 links on “What the heck is a Hyperborea?” have dropped off the face of the net. I checked the rest of the links manually, and 2 more turned up broken sites with internal errors!
The first was easy. It’s an excerpt from the book, Arctic Dreams: Imagination And Desire In A Northern Landscape by Barry Lopez. I just pulled up the Archive.org copy, picked a sentence to search for… and found the same excerpt at another URL. (A classic college website issue: moving faculty pages from a specific server to a more general site.)
The other two that actually reported errors are both role-playing games. The MUD Darkwind has moved to its own domain. Epiphany: The Legends of Hyperborea is a little trickier. It’s missing from its publisher’s website, but there are references to it online. I figured I could link to the sourcebook at Amazon, or maybe to a review, but the most informative page I could find was on archive.org.
Now to the sites that lied and reported “200 OK” instead of an error code. One was a page describing Clark Ashton Smith’s book, Hyperborea. The site had a search box on the home page, making it easy to find the new location. (It would have been nice if they’d actually removed the old script instead of letting it break. A 404 or even a 500 would have helped me catch this earlier.)
That leaves a Conan reference site, which is shut down, the domain name listed for sale. I went looking and found a site with maps of the world in which Conan takes place, showing Hyperborea near Cimmeria.
It’s just odd that three links would vanish from the same page at more or less the same time.
While looking for more ideas related to my earlier post on fighting link rot, I came across some interesting articles:
Web Sites that Heal considers some of the causes of linkrot, including: changing CMS systems (which I’ve dealt with here twice), poor structure (starting small and simple, but finding that as the site grows, the old design doesn’t work anymore), lack of testing, and plain apathy. More interesting are some of the reasons it becomes a problem, in particular the difficulty in setting up redirections and informing other sites that you’ve moved. That’s something else I can relate to: My site hasn’t been on the UCI Arts server in four years, yet despite a massive attempt to get people to update their links, Altavista still shows 82 pages linking to my site’s old location. Something I think the article leaves out is the number of sites – particularly people who set up a free Geocities account back in the dot-com era – that just aren’t maintained anymore. The pages are there, but they’re six years out of date – and so are the links.
The article then proceeds to suggest an automated server-to-server system that will detect incoming links to a moved page, then contact the referring site, report the new location, and instruct it to update the link with no human intervention whatsoever. A great idea, though it will require people like me to drop the edit-locally-and-upload model of development.
“Web Sites That Heal” referred to a Jakob Nielsen column on Linkrot. Nielsen’s advice is frequently useful, though not always applicable [archive.org]. Sadly, his recent columns have tended toward rehashing old ones or applying to ever more specialized niches, but sometimes his advice is spot-on. In this case, the article from six years ago still applies to today’s web: run a link validator on your site from time to time, and keep old URLs on your own site active (whether with actual content or with a redirect). The comments on this article are worth reading as well.
Lastly, I found a remark on Consequences of Linkrot as applied to weblogs. Most of the post is actually an excerpt from Idle Words [archive.org], where the original author notes that the classic blog post – a single line linking to something of interest, or a series of the same – is particularly susceptible to linkrot. Without the original material, there’s nothing (or next to nothing) left. And it happens fast: The Web isn’t that old, and blogging is even younger, yet information is disappearing rapidly enough that you really have to wonder how much of what exists today will still be around – in any form – ten years from now. One of the key lessons DeLong takes from this article: it’s “critically important not just to link but to quote–and to quote extensively.”
The lesson is clear: The site you link to today may not be there tomorrow, and you may not have the time (or inclination) to go chasing it down. Quote it, summarize it, add context, write lots of commentary, whatever. Make sure what you post can stand on its own… just in case it has to.
On an ideal Web, pages would stay put and links would never change. Of course, anyone who has been on the Internet long enough knows just how far away this ideal is. Commercial sites go out of business, personal sites move from school to school to ISP to ISP, news articles get moved into archives or deleted, and so on.
There are two sides to fighting link rot. The first is to design your own site with URLs that make sense, that you won’t find yourself changing a few months or years down the road. If you have to move something, use a redirect code so that people and spiders will automatically reach the new location.
The other side to the fight is periodically checking all the links on your site to make sure they still go where you expect.
So how do you handle online journals? Obviously they’re websites, so from that standpoint you should at least try to keep the links current. But on the blogging side, there are problems with this, in particular the school of thought that you should never revise a blog entry (also discussed in Weblog Ethics). Continue reading