Last week I installed Ozh’ Spam Magnet Checker. It’s a WordPress plugin that looks through your spam folder, groups the spam comments by post, and shows you a pie chart of the posts that attract the most spam.

Aside from satisfying curiosity, it can give you an idea of what types of posts spambots like on your site. Also, if you find a particular post tends to get lots of spam but hasn’t received legitimate comments in a long time, you can close comments on that post, cutting off the chance that something might slip through the filter.

I ran it for a week here on K-Squared Ramblings and Speed Force. (It looks at the current spam folder, which I usually clean out every time I check for false positives, so I had to let it sit for a while.)

I was sort of hoping for something more obvious, but instead it’s a fairly smooth distribution. The top posts don’t get that much more spam than the next tier, or the next after that. Though I’m kind of surprised to see the Babyon 5 Lost Tales post so high on the list.

At first glance, the chart for Speed Force looked even smoother. The top post only accounted for 2.3% of the week’s spam.

Then I looked down the list. See all those posts starting out with “Quick Thoughts…”? Those are all old Twitter digests, back when I was still archiving them. They’re a mix of old links and old time-specific remarks, and chances are that any useful comments were made more than a year ago — on Twitter, before they were imported. All together, these old Twitter digests were pulling in 16% of the spam targeting Speed Force, on a class of posts that only made up 6% of the archives.

A nice trick, considering I had already closed comments on all of them. It turns out, spammers have been sending trackbacks to these posts. I’d never really noticed the pattern before, but now that I know, I can close pings on them as well.

That was the main thing I discovered by giving the plugin a week’s worth of spam. YMMV.

(Tip of the hat to Weblog Tools Collection for pointing me in the direction of this plugin!)

I found a sneaky type of spambot this morning. It was impersonating regular commenters on Speed Force, using their names and (at first glance) email addresses to blend in.

The names weren’t terribly surprising, but the email addresses were. Where had it gotten them? WordPress shouldn’t reveal them, unless there’s a bug somewhere. Was one of my plugins accidentally leaking email addresses? Had someone figured out a way to correlate Gravatar hashes with another database of emails?

As I looked through the comments, I realized that in most cases, it wasn’t the commenter’s usual email address. Here’s what the spambot was doing:

  1. Extract the author’s name and website from an existing comment.
  2. Construct an email address using the author’s first name and the website’s domain name.
  3. Post a comment using the extracted name, the constructed email, and a link to the spamvertised site.

The actual content (if you can call it that) of the comments was just a random string of numbers, and the site was a variation on “hello world,” leading me to suspect that it might be a trial run. Certainly they could have been a lot sneakier: I’ve seen comment spam that extracts text from other comments, or from outbound links, or even from related sites to make it look like an actual relevant comment.

I’d worry about giving them ideas, but I suspect it’s already the next step in the design.

Update: They came back for a second round, this time here at K2R, and I noticed something else: It only uses the first name for the constructed email address, but does so naively, just breaking the name by spaces. This is particularly amusing with names like “Mr. So-and-so,” where it creates an address like mr@example.com, and pingbacks, where the “name” is really the title of a post.

While cleanning out the comment spam folder on Speed Force, I found this gem:

Hi this is a attempt to get noticed on the world wide web and hopefully spread the word about our services. It would be kind of you if you allow me to share my online marketing one the site. The company name is [REDACTED]. Thanks

I suppose you’ve got to give them points for honesty.

I found a comment in the spam folder for Speed Force that, on first glance, looked like an actual, relevant comment…to a different post. It was a coherently-written paragraph about how someone had “considered getting a second Captain Cold” action figure to customize it, but it was posted to an article about stalled miniseries. The author’s name and link were obvious spam, though (seriously, “watch full movies” is the best you can do?).

My first thought: They’d copied the text from another comment on the site. I’ve seen that happen before, but usually it’s comments on the same post. A search through existing comments didn’t turn up any matches, though.

So then I did a search on the rest of the web, and found the original comment on a review of an Atom Smasher toy.

Someone had gone looking for a site with a similar topic (comic books about super-heroes, action figures made from super-heroes), copied text from there, and pasted it onto mine…and yet they hadn’t bothered to match up specifics (like pasting it on a post about action figures or Captain Cold). So it’s not quite as sneaky as the one who followed a link in my post and pasted in text from the other page, but it’s pretty close.

Judging by a quartet of comments posted this evening, 3 of which slipped past Spam Karma, someone’s started outsourcing comment spam to India. (I’m serious, the IP addresses were assigned to Bharti Airtel and BSNL Internet, both ISPs based in New Delhi.)

They were posted quickly, as if they’d been composed in another editor and pasted into the form. More importantly, they were actually posted through the form, not just sending data directly to the handler. And most tellingly, the posters had gone to the effort to fill out the CAPTCHA that Spam Karma provides to allow human commenters to recover from a false positive.

The one I liked best, from a technical perspective, was posted on Tall Ships of San Diego. The spammer had followed my link to the San Diego Maritime Museum, then followed that to a page describing one of the ships, the Californian, and generated a post by stringing together sentences from that page. The whole thing linked to a student loan site.

At first glance, it looked like a garbled, on-topic comment from someone who maybe didn’t speak English as their first language. That happens, and if it’s a legit comment, I leave it. In fact, I considered leaving the comment but deleting the author URL, until I looked up the ship. (It wasn’t one of the ships we toured on our visit, and I didn’t recognize the name.) As I looked at the ship’s profile, I started recognizing text from the comment. At that point it became clear what was going on, and I started looking at the other comments posted over the last few hours.