When I started the category clean-up project a while back, I decided to start monitoring 404 errors on the blog to see if I missed any incoming links that needed to be redirected. I was surprised to find that the logs showed no 404 errors at all from within the blog structure. Images, sure, but no articles, no tags, no categories. This seemed a bit hard to believe.
I tested it by deliberately hitting a non-existent page, and was dismayed to find that Apache logged the hit as 200 (OK).
Crap! a WordPress update must have broken 404 handling! How long had this been going on? I’d better manually insert a header in the 404 page!
That seemed to work, as far as Chrome’s Developer Tools and curl -I were concerned. I didn’t have time to follow up on the logs right away, so I checked back later…and the logs still showed 200 OK, not 404.
It turned out that, when served through WordPress, Apache was sending a 404 code to the browser but logging a 200.
Probably a plugin, right?
Not so. I installed a fresh copy of WordPress on a test site and discovered something interesting: 404 codes were logged correctly when using the default /?p=123 permalink structure, but if I changed it to anything readable like /yyyy/title or even /title, the problem recurred.
A little more investigation: I skipped WordPress entirely and just hit a PHP page that served up a 404. When I hit it directly, it logged correctly. But when I used WordPress’ mod_rewrite rules to send a hit to that page, it logged a 200.
So clearly, it was something about mod_rewrite. I don’t run my own Apache server these days (my department at work is mainly a Windows shop), but I was pretty sure it didn’t work that way back when I did.
So I did some testing of different configurations at home and on my webhost. Direct hits always logged the correct status, but with a rewrite rule, here’s what I found:
FastCGI & CGI on DreamHost show 200/404.
mod_php on home box shows 404/404.
mod_php on DreamHost shows… 200/404.
At this point I figured there was no point setting up a CGI or FastCGI-based PHP environment on my home box, because it was clearly something about Dreamhost’s Apache configuration.
It does log correctly if you use ErrorDocument directive to point 404 to a PHP script. But IMO that’s abusing the error handler mechanism to do something it wasn’t meant for. (Not that I haven’t done it myself, but only on older IIS servers where ISAPI Rewrite and URL Rewrite weren’t available.)
I’ve added a custom logging snippet to my WordPress 404 page. There are other ways I can capture the data, but that seemed like the least overhead for now.