All About Spam: The Case of the Productless Spam
The classic spam is a smoking gun, easy to spot. Viagra. University diplomas. My new favorite, the acai berry. But some messages have a twist; they don't appear to be selling anything at all! I received the following email today:
From: firstname.lastname@example.org and I are not best buds. That's the whole message; it doesn't even have a link in it. Aren't spammers supposed to be selling me something? So, why did a spammer bother sending me this message?
Subject: NYC judge denounces woman's self-styled sting
Militants Attack NATO Terminal In Pakistan
Elementary, my dear readers! The first reason is simple: they could be probing for valid email addresses.
The second reason: they're trying to beat the system. In 2002, Paul Graham popularized a plan to filter spam using all your spam and all your ham (legitimate mail) to generate a giant word list, known as Bayesian filtering. Each word would be given a score, based on how frequently it appeared in spam vs. ham. The idea had two key points:
- it would learn about new spam words as they were introduced
- "good" words could offset "bad" words
So, how does it all work? Well, let's take that most popular of all spam words, Viagra. Your gossipy friend sends you a message all about herself, and it happens to include "I hear Joe started taking viagra!" A keyword-based spam filter will block any message that contains "viagra", so out it goes. A Bayesian filter would say, all these "I"s outweigh the the one "viagra", and let it through.
For a short while, Bayesian filters were all the rage, and very effective, because they were trained per user. Spammers never let a good plan get them down, though, and came up with a simple, ingenious solution: start sending random content. In the early days, it was snippets from great books (read David Copperfield one paragraph at a time!). They've since moved on to simple randomized phrases, and headlines like today's. All these red herrings have certainly degraded the accuracy of Bayesian filters, but like a good detective, spam filters try all the tools in their arsenal, hoping to find the one that closes the case.
Do you love sending email so much it hurts? See some simple stretches to relieve carpal tunnel syndrome pain.