What’s in a User-Agent String?

Some people browse collections. I collect browsers. Mostly I just want to see what they’ll do to my web site, but I have a positively ridiculous number of web browsers installed on my Linux and Windows computers at work and at home, and I’ve installed a half-dozen extra browsers on our PowerBook.

One project I’ve worked on since my days at UCI was a script to identify a web browser. In theory this should be simple, since every browser sends its name along when it requests a page. In practice, it’s not, because there’s no standard way to describe that identity.

Actually, that’s not quite true. There is a standard (described in the specs for HTTP 1.0 and 1.1: RFC 1945 and RFC 2068), but for reasons I’ll get into later, it’s not adequate for more than the basics, and even those have been subverted. That standard says a browser (or, in the broader sense, a “user agent,” since search robots, downloaders, news readers, proxies, and other programs might access a site) should identify itself in the following format:

  • Name/version more-details

Additional details often include the operating system or platform the browser is running on, and sometimes the language.

Now here are some examples of what browsers call themselves: (If your browser supports the title attribute, you can hover the mouse cursor over each line to see what program it represents. Edit Aug. 19: rearranged list for clarity and added a few more. Edit Sep. 11: added the IE/WinXP SP2 example.)

  • Netscape Variations (non-Gecko)
    • Mozilla/4.7 [en] (WinNT; U)
    • Mozilla/4.72 [en] (X11; U; Linux 2.4.18 i686)
  • Internet Explorer Variations
    • Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)
    • Mozilla/4.0 (compatible; MSIE 5.23; Mac_PowerPC)
    • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)
    • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461; .NET CLR 1.1.4322)
    • Mozilla/4.0 (compatible; MSIE 4.01; Windows CE; PPC; 240×320)
    • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 1.0.3705)
  • Opera Variations
    • Opera/6.0 (Macintosh; PPC Mac OS X; U) [en]
    • Opera/7.11 (Linux 2.4.20-18.9 i686; U) [en]
    • Opera/7.50 (X11; Linux i686; U) [en]
    • Mozilla/4.0 (Windows NT 5.0;US) Opera 3.60 [en]
    • Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 4.0) Opera 5.11 [en]
    • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Opera 7.50 [en]
  • Gecko Browsers
    • Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.3.1) Gecko/20030524
    • Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02
    • Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.0.1) Gecko/20030306 Camino/0.7
    • Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040612 Firefox/0.8
  • KHTML-Based
    • Mozilla/5.0 (compatible; Konqueror/3.1; Linux)
    • Mozilla/5.0 (compatible; Konqueror/3.2; Linux) (KHTML, like Gecko)
    • Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.2 (KHTML, like Gecko) Safari/125.8
  • Others
    • Lynx/2.8.4rel.1 libwww-FM/2.14 SSL-MM/1.4.1 OpenSSL/0.9.6h
    • IBrowse/2.2 (AmigaOS V45)
    • Dillo/0.8.1
    • NCSA Mosaic/3.0.0 (Windows x86)
    • Mozilla/4.5 (compatible; OmniWeb/4.2.1-v435.9; Mac_PowerPC)

The first thing you’ll probably notice is that most of these claim to be Mozilla. This is a holdover from the early days of the Browser Wars. Netscape was frustrated with what could be done with HTML 2, and started building its own extensions. Web sites would check to see if the browser was Netscape (they used Mozilla as their code name) in order to decide whether to send the enhanced page or the plain one. Microsoft, hoping to get in on the Web action, wanted all of these sites to send the enhanced pages to their browser, so they identified it as Mozilla with a “compatible” note, and the real name in the comments.

So now you see things like Mozilla/4.0 (compatible; MSIE 5.22; Mac_PowerPC) — which is not Netscape 4.0, as the basic version states, but is actually Internet Explorer 5.22.

Moving along, we find browsers like Opera, which used the same reasoning as IE, but a different format: Mozilla/4.0 (Windows NT 5.0;US) Opera 3.60 [en] So for Internet Explorer, you have to look inside the parentheses, right after the word “compatible.” But for Opera, you had to look after the parentheses.

At the height of the Browser Wars, Netscape decided to release the source code for version 5 under an open-source license, allowing anyone to look at the code, modify their own copy, and suggest improvements or bug fixes. They called this code Mozilla, to distinguish it from the finished Netscape browser. This was problematic, though, because putting Mozilla at the beginning would simply look like another version of Netscape. If you had Mozilla version 0.1alpha, you didn’t want it to look like it was really Netscape 0.1, because then the server would assume you couldn’t handle things like JavaScript, frames, tables, plugins, etc. Netscape and Mozilla.org went through a number of different plans, finally settling on using Mozilla/5.0 to start, putting the “real” Mozilla version at the end of the parentheses, then putting detailed build information and any “official” browser names (like Netscape, Beonex, Camino, Firefox, etc.) after the parentheses. So you end up with an ID like Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02 for Netscape 7.

Meanwhile, Netscape was losing ground to Microsoft. Netscape 4 and IE 4 were comparable, and Microsoft was bringing the full weight of marketing to promote their free-as-in-beer alternative. More importantly, Microsoft was relying on a basic human trait to do their work for them: laziness. By tying IE to Windows and making it the default web browser, they virtually guaranteed that the vast majority of people buying a new PC would start using IE. (This is not conjecture, this is fact: Microsoft was convicted of abusing their near-monopoly on the desktop to gain a monopoly on the web.) And with Netscape 5 delayed by rebuilding their code from scratch, IE 5 easily surpassed Netscape 4 both in technology and market share.

So at the end of the decade, Internet Explorer had become the prevalent web browser. People stopped testing pages in anything but Internet Explorer. Some of them were amateurs who didn’t know better, some were professionals who were frustrated by Netscape 4’s limitations or limited by deadlines or funding, and it became easier to just discount the shrinking Netscape market. Smaller projects, such as Opera (now the #3 browser) realized they needed a way to get into sites that blocked non-IE browsers. As a result, the default identification for Opera 7.5 is Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Opera 7.50 [en] — identifying itself simultaneously as Netscape 4, IE 6, and Opera 7. (Fortunately, Opera does allow you to set it to “do the right thing” and identify itself with the more sensible Opera/7.50 (Windows NT 5.0; U) [en].

But wait, it gets even better!

Enter Apple. Prudently realizing they might not want to rely on Microsoft for the default browser on the Macintosh, and also noticing that there were several high-quality rendering engines available to use as the basis of a new program, they settled on KHTML, the code used by KDE’s Konqueror browser. KHTML, like Mozilla, had standards compliance as one of its goals, so they decided to leverage all the post-Mozilla 1.0 articles which recommended checking for the phrase “Gecko” to determine whether you could rely on the browser being able to handle advanced features that IE ignores. So their beta called itself Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/60 (like Gecko) Safari/60. People were appalled. Mozilla aficionados, already upset that Apple had chosen another program, complained that they were diluting the meaning of Gecko by using it on a browser that didn’t behave in exactly the same way. KDE fans complained that Konqueror would be further marginalized because people would start looking for “AppleWebKit” or “Safari.” They eventually compromised by changing the wording to “KHTML, like Gecko” with the KDE people adding “KHTML” to Konqueror’s ID. Strangely, even now Safari doesn’t use its actual version number. Going by what it reports, you’d think it was Safari 125.2.

So now there are at least four major places browsers put their real names:

  • Name/version
  • Mozilla/x (compatible; name/version)
  • Mozilla/x (details; version)
  • Mozilla/x (details) more details name/version

You’d think these would be enough, right? Wrong. I’ve seen browsers use all of the following:

  • Mozilla/5.0 (X11; U; Linux i686; en-US; SkipStone 0.8.1) Gecko/20020417
  • Mozilla/5.0 (X11; U; Galeon; en-US; 0.11.3)
  • Mozilla/5.0 (X11; U; Linux i686; en-US; Galeon) Gecko/20010701
  • Mozilla/4.61 [en] (X11; U; ) – BrowseX (2.0.0 Linux 2.4.9-31)
  • Links (0.96; CYGWIN_NT-5.0 1.3.22(0.78/3/2) i686)

Does the real name go in the middle of the parentheses? At the end? Before? After? Is the version number put after a slash or a space? Is it put in the parentheses instead? Is it even there? What were these people smoking?

So you can see, just figuring out the real name and version number of a browser is far from easy.

Now, suppose you want to identify what operating system it’s running. (Insert maniacal laughter.) Let’s assume for the moment that you know where to look. Some of them are easy, like Windows NT 4.0 or PPC Mac OS X or Linux i686 — or are they?

Somewhere around version 4, Netscape started identifying all Unix-like OSes as X11 (since the program ran under the X windowing system), with another field to identify SunOS, BSD, Linux, etc. So for those, you need to look in two places. (And then there’s trying to pick out Solaris versions from SunOS…)

Then there’s Windows. You can spot Windows NT or Windows 98 but some of them are truly bizarre. Windows 2000 claims to be NT 5.0. Windows XP claims to be NT 5.1. And Windows Me first claims to be Windows 98, then adds an extra field identifying itself as Win 9x 4.90 — really!

If you want to figure out whether the browser is running on a Mac, it might be in the form Mac_PowerPC, PPC Mac OS X, or PPC Mac OS X Mach-O (those are all from the same computer, by the way, using IE 5.2, Safari, and Camino).

If all you need to know is whether your audience is using Windows, Macs, or something Unix-like, your best bet is to just look for Win, Mac or X11. Don’t look for PPC or PowerPC by itself, because you’ll catch people running Yellow Dog Linux or MorphOS. (Edit Aug. 19: Or, it seems, Windows CE. See comments for more info.)

And trying to guess capabilities based on the browser’s (supposed) name is just asking for trouble. You’re better off testing based on actual capabilities. Want to support browsers that don’t have JavaScript? Use a <noscript> block. Want to send XHTML to browsers that can handle it and HTML to browsers that can’t? Check the Accept header. Worried about what what DHTML methods to use? Have your JavaScript check what’s available.

Unfortunately, trying to read the User-Agent name is like trying to read a map with no key. You need to know what you’re looking for.

(Whew — finally posted that! I started writing it a year ago, and found it in my Drafts folder today.)

View Kelson Vibber's LinkedIn profileView Kelson Vibber's profile on LinkedIn