Robots in Disguise

Wondering just how many Netscape 4 visitors this site gets, I pulled up some server stats and noticed two very strange patterns.

The first appears to be a spider, calling itself Mozilla/4.08. It’s already suspicious, since the real Netscape 4 includes the language and OS, as in Mozilla/4.08 [en] (Windows NT 5.0; U). Then there’s the pattern: lots of hits from the same IP, all to actual pages—not a single image, style sheet, or script—and some interesting mistakes that look like it misparsed the links.

The other pattern showed Netscape 4 requesting favicon.ico. The thing is, Netscape 4 doesn’t know about favicons. This is scattered across a few visitors from various IP addresses and looks like actual visitors—show up, look at a page or two with images and styles, etc. Versions range from 4.06 to 4.8, and platforms include Windows XP, Linux, BeOS, and—believe it or not—CP/M. Actually, the last set of hits admit to being Mozilla/4.7 [en] (CP/M; 8-bit; Fake user agent). The only direct reference I can find calls it a robot, but it seems the anonymizing features in Squid use CP/M in their example fake UA.

So why do browsers and robots fake their identity? Sometimes it’s for anonymity. You might not want to be tracked, whether for simple privacy reasons or because you’re doing something illicit like harvesting email addresses. Other times it’s out of necessity, when sites send different content to different browsers. Some will mimic another browser exactly, like the User-Agent Switcher extension for Firefox, and others will mimic it just enough to get by, like Opera.

Whatever reasons people (or programmers) use, the results are sometimes strange—like MSIE on Linux!

3 thoughts on “Robots in Disguise

  1. Alistair Phillips

    Do you think the people think it would be strange to find a user visiting their site using Internet Explorer for a ZX80 hahaha

    Reply
  2. Kelson Post author

    A ZX80! Can those even run anymore? 😀

    Looking at the stats output, I do see a few hits from Amigas, BeOS, and OS/2, but those are about the same level as CPM, so I don’t know how many of them are serious.

    I do rather like the Anonymizer’s “TuringOS,” though I haven’t seen it in a while. I wonder if they still use that ID.

    Reply

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.