Dealing With Comment Spammer Infestations

(Oct. 14th Update: MT-Blacklist has arrived!)

…our comments are being porn-spammed (at Armed Liberal as well, and I’ll be emailing some other blogs to see if they’ve been hit as well). We’re cleaning it up as fast as we can, but we’ve been hit by a series of spams from a Russian porn site. The last one appears to have left several hundred comments, and additional mutations are possible. So far we’ve seen “Lolita,” Preteen,” and “Underage”. Teresa Nielsen Hayden has more info. on the spammers, Scriptygoddess has a slew of admin. options for you, and Burningbird has a fairly simple way to make it harder for spammers next time (Hat Tip: David Janes).

JK: It’s an organized effort… was highly ranked at Blogdex.net a couple days ago, but I think they’ve put in filters. We may do the same soon, and meanwhile I’ve disabled all comments. We’ve also got a Swedish neo-nazi group that hangs out here and occasionally posts long rants. If you want to see an example, do a search for “Conspiracy and Truth Week” because I delete it everywhere else.

Re: the comment spams… why does this matter? And what can be done?This matters because if pornospams et. al. are left unchecked, they will significantly impair the entire weblogging community – not just by killing comments as a normal blog feature, but by triggering automated filtering software at some workplaces once they notice all the porno links. What do we need to prevent that? Software, and support.

Software: Yoz Grahame’s Cheerleader has a very intelligent set of suggestions, in “7 Tips for a spam-free blog“. The article addresses tools vendors as well, which I especially appreciate. It also references Mark Pilgrim’s outstanding overview of Club vs. LoJack solutions, which is finally available again after going down yesterday. If you’re looking for serious long-term thinking about how our tools need to evolve and what we need to do, Mark’s piece can’t be beat. Though Shelley has a good one, with some worthy cautions about trust networks and smart feature requests.

Roald and Macdonald have an Open Letter to Google which is very much on point. We all have a mutual interest in stopping this, and working together from both ends just makes sense.

I’ll add another thought. Not only do we need MT-Blacklist, we also need a clean-up utility. One that looks in the comments for the “URL” field, and when it finds a match with our ban list (or even a specific entered value for v1.0), it collects that comment and presents us with a “Power Edit” list that allows us to delete comments in batches of 25-100 at a time. When we’re done, one site rebuild would allow us to have a completely clean blog.

Support: In addition, hosting providers have to get smarter. Tens or hundreds of weblogs rebuilding hundreds of entries will have the same effect on their servers as a denial-of-service attack. Comment spam should therefore be treated like one. For starters, hundreds of incoming data posts from the same IP ought to raise a red flag and cause diversion or access denial.

Mwanwhile, our provider at Bloghosts.com has already moved to firewall out the following netblocks from their servers: 209.120.176.0/24 and 62.42.228.0/24. This will help for now, but over the long term they may want to consider an add-on service. It would include installation of MT-Blacklist, configured to draw from a central blacklist hosted and updated by bloghosts.com themselves, plus renamed CGI submission scripts in their MT(Movable Type) installations to make blogs they host a lower-profile target. The Cadillac option could even include an upgraded Host-specific MT package with a full-fledged spamtrap configuration.

That would be a substantial draw for many bloggers, I think, who would gladly pay additional fees for services that take this problem off their hands.

This much I do know – we’ll need these measures sooner rather than later. Preteen, Lolita and the spawn were just the beginning. There’s no reason these attacks couldn’t be scaled to add hundreds of comments to each weblog, and no reason why they wouldn’t be. Brace yourselves, because you ain’t seen nothing yet.

21 thoughts on “Dealing With Comment Spammer Infestations”

  1. You should consider Bayesian content-filtering. I’ve been using it on both my work and home emails for about a year now, and it’s produced amazing results. The disadvantage is that you need a *lot* of “good” mail and “bad” mail in order to get the databases intelligently built, but the advantage is that they continue to get “smarter” as more and more spam is “caught”.

    Email for more details if you have no idea what I’m talking about.

  2. Warning: 209.210.176.0/24 may not cover enough ground. My 21 Lolita-spams came with last numbers 20, 21, 22 — and 33. I got 4 more on my other site (www.curculio.org), the one with only 15 posts and 2 genuine comments in the last 6 months (maybe I should spend more time there).

  3. Oh, and for what it’s worth, here’s my current banned list. Not all spammers though, but it may be handy for comparison purposes:

    211.10.197.13 2003.06.23
    81.135.77.87 2003.07.26
    134.28.148.47 2003.08.14
    203.62.10.3 2003.08.28
    212.69.231.226 2003.09.16
    66.111.50.170 2003.09.26
    209.210.176.21 2003.10.09
    209.210.176.* 2003.10.10
    219.95.14.69 2003.10.12

  4. I have had the same problem at PoliBlog. Like StarHawk I have been able to catch, delete and then IP ban. I have had the Lolita one several times, and from at least two different IPs. Plus numerous Viagra-related ones and several Direct TV ones as well.

  5. I’ve seen many porn spams on Slashdot. I wonder if people who don’t like the site’s viewpoints use this kind of comment as a way of attacking both the site and the people who run it.

    Anyway, this kind of thing has been happening for some time on Slashdot. Slashdot has remained accessible at every place I’ve worked.

    Slashdot’s moderation mechanism tends to quickly lower this kind of crap to near invisibility. You do have to either catch the garbage early or deliberately look at comments rated -1 to find the stuff.

    Perhaps they might be willing to offer advice.

  6. I got hit with the same lolita / preteen crap a couple of days ago. IP was 209.210.176.33. I also deleted the comment and then banned the IP.

  7. my guess is that you can’t keep up and win on ip-blocking. Take a look at how yahoo/hotmail/paypal stop people from getting oodles of accounts, etc. they use a “Reverse turing test” or a “CAPTCHA”–it’s a graphic that involves reading a word that’s presented in such a way that no current OCR technology can solve the problem. this way no script can activate the comments, only a human can. yes, you’d still have to block human spammers, but that’s really not where the majority of the problems are coming from. for every submission, you present a graphic image as a test, and you ask the submitter to tell you what the word is that they see in the graphic. a variety of fonts and backgrounds are used that prevent ocr from handling this.
    Do tell your MT people about this suggestion. I know how they are created/implemented, and so do some other folks. Good luck.

  8. you know, i was serious about that post. you needn’t have it deleted. a simple answer and i will understand.
    —-

    {NM: Advertisements for Rolex watches were considered spam; as a result the entire post was deleted. Read the “Winds of Change comments policy”:http://www.windsofchange.net/archives/003367.php, please. Future posts containing those links will probably also be deleted without comment.

    — Marshal Nortius “Big Tuna” Maximus}

Leave a Reply

Your email address will not be published.