Back in July(?) 2006 when Microsoft issued an update to the Windows Genuine Advantage tool, I figured I may as well install it (I’d be forced to eventually) on my one Windows box. So I installed it, and rebooted, and the login screen proclaimed loudly that Windows was not genuine. (Well, not literally loudly, it didn’t shout over the speakers or anything — which would be an interesting deterrent, now that I think about it.)

This came as something of a surprise, given that:

  • This was a Dell, not some no-name computer.
  • It still had the original OS install, and no hardware had been changed.
  • The previous version of WGA had reported no problems.

I logged in, did some searching on Microsoft’s knowledge base, and found a link that said something like “Validate here.” I clicked on it.

To my surprise, it told me my copy was perfectly valid.

I eventually concluded that Norton Internet Security had blocked the initial validation attempt. Because there was no desktop shell, there was no opportunity for it to pop up a notice and ask me if I wanted it to let the data through.

After that experience, I can’t say I’m surprised that Microsoft found many of their false positives to be the result of security software. Admittedly, they were looking at registry changes, crypto problems and McAfee, rather than a transient error with Norton.

(Reposted fromĀ this comment at Slashdot, mainly so I can find it again easily without searching.)

I recently discovered exactly how the Wayback Machine deals with changes to robots.txt.

First, some background. I have a weblog I’ve been running since 2002, switching from B2 to WordPress and changing the permalink structure twice (with appropriate HTTP redirects each time) as nicer structures became available. Unfortunately, some spiders kept hitting the old URLs over and over again, despite the fact that they forwarded with a 301 permanent redirect to the new locations. So, foolishly, I added the old links to robots.txt to get the spiders to stop.

Flash forward to earlier this week. I’ve made a post on Slashdot, which reminds me of a review I did of Might and Magic IX nearly four years ago. I head to my blog, pull up the post… and to my horror, discover that it’s missing half a sentence at the beginning of a paragraph and I don’t remember the sense of what I originally wrote!

My backups are too recent (ironic, that), so I hit the Wayback Machine. They only have the post going back to 2004, which is still missing the chunk of text. Then I remember that the link structure was different, so I try hitting the oldest archived copies of the main page, and I’m able to pull up the summary with a link to the original location. I click on it… and I see:

Excluded by robots.txt (or words to that effect).

Now this is a page that was not blocked at the time that ia_archiver spidered it, but that was later blocked. The Wayback machine retroactively blocked access to the page based on the robots.txt content. I searched through the documentation and couldn’t determine whether the data had actually been removed or just blocked, so I decided to alter my site’s robots.txt file, fire off a request for clarification, and see what happened.

As it turns out, several days later, they unblocked the file, and I was able to restore the missing text.

In summary, the Wayback Machine will block end-users from accessing anything that is in your current robots.txt file. If you remove the restriction from your robots.txt, it will re-enable access, but only if it had archived the page in the first place.

(Originally posted as a Slashdot comment. I reposted it here several years later, and have since backdated it to the original time.)