Another problem I’ve noticed in my Twitter archive: Lots of URL shorteners and image hosts have shut down or purged their archives.

Sure, bit.ly and is.gd and tinyurl and ow.ly are still around. But in the days before t.co, I used a lot of different Twitter apps that used different shorteners or image hosts.

I have photos posted not just at Twitter and Twitpic, but at phodroid, mypict.me, and twitgoo. In some cases the description and date can point me to the right picture on my hard drive or on this blog (I used to import a daily digest of tweets, and I still sometimes use Twitter as a rough draft for content here). In some cases I can narrow it down to a group of photos — the 2012 partial solar eclipse, for instance.

In some cases, I have NO IDEA what the photo was:

Not sure if the misspelling will be legible in this upload phodroid.com/hvcyxw

— Kelson Vibber (@KelsonV) January 28, 2009

Similarly, I linked to a lot of articles that might still exist, but the short URLs don’t point to them anymore. Services like tr.im, short.to, and awe.sm. StumbleUpon’s su.pr. In some cases a publisher set up their own shortener, and has since dropped it. Again, sometimes I can find it from here. Sometimes the description includes a quote or title that I can search for.

Oddly enough, I found most of my lost awe.sm links by looking at Del.icio.us, which apparently unwrapped the links when they imported from Twitter way back when. It’s still around and searchable. For now. (I should look into what you get from their archive.)

It’s true that these problems are biggest if you were on Twitter before they implemented their own link shortener and image hosting. But a lot of tools (Buffer, for instance) still use their own shorteners for tracking purposes, so you’re not just depending on the tool being around long enough to post your tweet, you’re depending on it to stay around for the rare person who stumbles on an old thread and wants to see what you were talking about.

And even if you didn’t start using Twitter until they hosted photos themselves, Twitter doesn’t include your photos in your archive! If you want to save your own copies in case they go the way of GeoCities or even photobucket, you’ve got to hold onto the originals or download them yourself.

In cleaning up dead links, I stumbled on an old post about linkrot in which I wondered “how much of what exists today will still be around – in any form – ten years from now.”

Well, it’s been ten years. That post had seven external links. Four of those are no longer active, though I was able to find three of them on archive.org. (The fourth was a link to a search result set on AltaVista. Yes, AltaVista.)

That’s right: More than half of the links on an article about linkrot have rotted away. Appropriate, that. And a reminder to always provide some context when linking out to something that you can’t personally ensure will stay online.

Before the quick-status type social networks like Twitter and Facebook took off, it seemed like everyone was starting a blog. And every company seemed to want to get in on it: it wasn’t enough to have a forum, you had to have your own community, including — you guessed it — a blog.

Things change, of course. People move on to new interests. Businesses fold and are replaced with others. Online social activity has largely gravitated toward a small number of hubs. Hubs like Facebook, Twitter, Tumblr, Pinterest, Instagram. Old blogs are left unmaintained, and die. And those island communities like My Opera, or the Newsarama forums, or Comic Bloc, have also dried up, activity moving to the hot spots. Why go to the trouble of building your own social network when you can create a page on Facebook and be part of that one for free? That’s where your users/customers/fans are anyway!

So those special-purpose sites are going away too.

In addition to K-Squared Ramblings, I had a blog on LiveJournal (still there, but I haven’t updated it in years), and a blog on WordPress.com (also still there, but I changed its focus). I also had blogs at Spread Firefox, My Opera, and ComicSpace. I wrote for Opera Watch. I could swear I had something hosted by Flock even though I hardly ever used it.

I’ve been slowly migrating a lot of that material from those blogs to this one.

  • I had two convention reports on LiveJournal, and a zillion of them here. I copied over the two posts and cross-linked them.
  • After SpreadFirefox and Opera Watch shut down, I pulled what I could from archive.org and posted the more useful/interesting bits here.
  • When I finally figured out I wanted to make Parallel Lines a photo blog, I went through the earliest posts and brought over one or two posts that were worth keeping.

The latest is My Opera shutting down. It was announced back in October, which gives you an idea of how often I go there these days. Fortunately, they announced the closure early and provided tools to download your blog posts (with comments) and files.

Looking through 27 posts, a lot more of them than I thought turned out to be cross-posts or otherwise duplicate content. I found just seven with unique content that might be worth importing (one of those was only unique because my corresponding Spread Firefox post was already gone!), either for current or historical interest, and three duplicates with their own comment threads that might be worth merging. I particularly wanted to save On Broken HTML, and was amused to find this rant against combined stop and reload buttons, a fight that’s been completely lost.

Some content has gone the other way, though: After I launched Speed Force back in 2008, I started putting most of my comics-related thoughts there, or cross-posting them. And just last year, I started my Re-reading Les Misérables project in the pages of this blog, before breaking it off as its own subsite. The difference is that those are both self-hosted sites under my control. As long as I have access to web hosting and domain registration, and as long as I have backups, I’m set.

I set up 404 Notifier when I moved my Les Mis commentary to its own blog, to catch anything I might have missed while getting content moved and the new site set up. I then added the RSS feed to Feedly.

After a few weeks, I started noticing some odd links showing up to /r/bienvenu, but I couldn’t find anything that linked to that URL. Then I looked closer and realized it was Feedly itself that was hitting the link!

Basically:

  1. Broken URL gets hit.
  2. 404 Notifier adds the hit to the feed.
  3. Feedly retrieves the feed.
  4. Feedly follows the URL!
  5. Return to step 1.

The timing is inconsistent, but I think Feedly might be hitting the URL whenever I look at the list of “articles,” maybe checking for an image to use for the card in magazine view. And based on the first instance in the DB, I think it may have been a URL I used to test the plugin when I first installed it, then forgot.

For now, I’ve just removed the feed from Feedly. I’m considering altering the plugin to skip hits from Feedly, but I can probably just turn it off now that the blog has been up for a month. It’s served its purpose. If anything, it might make more sense to put it on this site to see if I missed any redirects (though I haven’t actually removed the old copies of the posts yet).

I turned on the broken link checker plugin at lunchtime, and let it run through the site over the next few hours* before checking back this evening.

Holy crap, there’s a lot of outdated links on this site! Over 300, in fact, linking to things like…

  • News organizations that discard their archives, or hide them behind paywalls.
  • Businesses that have, well, gone out of business.
  • Blogs that have shut down or moved.
  • Personal sites that have been abandoned.
  • Sites that have reorganized without setting up redirect rules for their old link structure. (Even the Star Wars official site did this with the movie pages!)

One of the dead links is, appropriately enough, to an article on top 10 web design mistakes. (I guess they missed one!) Another is actually on one of my articles on link rot from way back when.

And then there are the 700+ links that are being redirected, some of which should probably be updated, but some of which are certainly gateway pages — and some of which are probably pointing to a new site that took over the name, but not the content.

It’s often stated that once something goes up on the Internet, it’s there forever. But that’s not entirely true. What it is, is beyond your control. If someone else makes a copy, you can’t take it down (like the fable about releasing a bag of feathers from a mountain top, and then trying to collect all the feathers). But any individual copy — even the original — exists at the whim of whoever owns or maintains that site.

One question remains: Do these dead links matter?

I think they do, for three reasons:

  1. Links are source trails. A valid link may support what you’re saying, indicating that you know what you’re talking about. (Think of all those [citation needed] notes on Wikipedia.)
  2. Related to that, links provide context. Even today, with the masses chattering in short form on Facebook and Twitter, you’ll find people writing articles and responding to them with other articles. As long as the links remain intact, these aren’t monologues — they’re a conversation.
  3. When a whole site goes offline, you never know who’s going to pick it up. It could be someone with an opposite political agenda. It could be a spammer or malware peddler. A commenter from 5 years ago might lose their site and have it taken over by someone selling knockoffs of little blue pills — and now guess what you’re linking to?

*Something about the plugin really taxes the VPS that DreamHost offers, which is why I don’t have it running all the time anymore, but it only seems to affect the blog it’s running in, and of course it doesn’t impact static pages.