I recently stumbled across an old copy of the Demoroniser (which my American-trained sense of spelling keeps trying to spell as demoronizer), a script designed to correct some of the, well, moronic HTML generated by Microsoft Office. Aside from flat-out coding errors, Office would use non-standard characters for things such as curly quotes or em-dashes that would only show up on Windows computers. If you viewed these sites on a Mac, a Linux box, a Palm, etc., they would seem to be missing punctuation everywhere. His solution was to convert these to their plain-ASCII equivalents.

Over the last year or so, WordPress and A List Apart have converted me from “stick with the lowest common denominator” to “let’s show real typography.” Since the days of the Demoroniser, Unicode has become a standard part of HTML, so modern browsers* can either display a full range of characters or convert them to something they can display. You probably won’t be able to see Chinese text in Lynx, but a properly encoded curly quote—“ or ”—will show up as a plain old ".

For one thing, real typography looks much nicer. An actual “—” looks more professional than “--” does. Curly quotes are also more readable than straight quotes. Take this series of titles, first with curly quotes and next with straight quotes. With curly quotes, it’s easier to tell which pieces of text are inside the quotes and which are outside:

“Blah blah one,” “Another title,” and “Yada yada.”
"Blah blah one," "Another title," and "Yada yada."

Indispensable resources: Commonly Confused Characters and The Trouble with EM ’n EN are great for figuring out just which dash to use where, and also to get the codes right (if your authoring tool doesn’t take care of it for you). Evolt’s Character Entity Chart is helpful for looking up codes and for checking just how much your browser can (or can’t) handle.

So, reminded of the existence of the Demoroniser, I looked for a Unicode-aware update. The original script remains ASCII-only, but I did find the Unmoroniser, a modified version that converts the problem characters to the proper HTML entities instead. Accompanying the script is a rather long but nonetheless amusing rant on why this change is a good thing.

*In this case IE 5+, Mozilla/Firefox/Netscape 6+/etc., Opera 6+, Konqueror 3+ & Safari should have no problems. Netscape 4 manages the basics, but many characters only show up on Windows or don’t work at all.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.