Bad Behavior and Spam Karma do a good job of fighting most of the spam that hits this site, but over the last few weeks I’ve seen a (relatively) new kind that seems to require manual intervention: pingback spam.
It took a long time for spammers to really start abusing pingbacks, because of two things: First, pingbacks require the remote site to link to your site before they can get you to link to theirs. Second, it was just so much easier to abuse trackbacks and ordinary comments. I guess those have gotten locked down enough that it’s worth the effort to target pingbacks now.
The pingbacks show up within a few minutes to a few hours of posting. When I look at the site, it’s running actual blogging software; often I can recognize a particular WordPress theme. But it’s got more advertisements (usually Google ads) than content, the title and description look very keyword-stuffed, and the about page, if any, is just default text. The 15-minute–old post linking to mine has already been pushed off the front page, or else there’s only one post there anyway. Often they don’t bother with friendly permalinks, and just have the p=12345 structure… and the number is very high.
And that post that links to mine? Something like, “XYZ wrote an interesting post on [post title]. Here’s an excerpt:” followed by the first few lines of my post, and yes, an actual link. The funny thing is they rarely bother to make XYZ match either my name, or the site’s name. It’s often completely random.
Clearly they’ve got some RSS search looking for keywords, like Prius or Toyota or Christmas, then they automatically generate an excerpt and post to a blog that’s been set up not for human consumption, but for machine reading, and wait for the hits to come in. (At least they’re just using excerpts. It used to be more common for this sort of site to copy entire posts, and that really pissed me off.)
There’s a screenshot of such a page at Masters of Media, along with some commentary on the failure of nofollow to deter spammers, and the irony that Google is making money off of them. Meanwhile, the Blog Herald goes into the mechanics and ethics of the issue.
I don’t want to disable pingbacks, because it’s a useful feature—just like comments or email. But it’s clear that my current countermeasures aren’t up to this type of spam. Bad Behavior won’t bat an eye, since they’re using actual blogging software to connect. Spam Karma doesn’t seem to be catching them. Sure, I can add some phrases to the blacklist, but I’ve already seen variants popping up. And I’m reluctant to use Akismet, because I don’t want to submit every comment to some other site for verification.
Update (Nov. 27): A couple of relevant posts popped up on the Dashboard today. It seems Akismet is running into problems with people marking these messages as not spam. Also, Lorelle discussed the related topic of content theft, in which sploggers repost not just an excerpt, but the entire article, in hopes of getting their copy of your content indexed in search engines.