All About Spam: The Case of the Productless Spam
This article was originally published as part of the Pobox blog. Pobox was acquired by Fastmail in 2015.
All About Spam is a series of blog posts about common spammer techniques. Have a question about a type of spam that you'd like to see in a future blog post? Leave a comment, or send an email to pobox@pobox.com!The classic spam is a smoking gun, easy to spot. Viagra. University diplomas. My new favorite, the acai berry. But some messages have a twist; they don't appear to be selling anything at all! I received the following email today:
From: hfkunm@winartproje.comhfkunm and I are not best buds. That's the whole message; it doesn't even have a link in it. Aren't spammers supposed to be selling me something? So, why did a spammer bother sending me this message?
Subject: NYC judge denounces woman's self-styled sting
Militants Attack NATO Terminal In Pakistan
Elementary, my dear readers! The first reason is simple: they could be probing for valid email addresses.
The second reason: they're trying to beat the system. In 2002, Paul Graham popularized a plan to filter spam using all your spam and all your ham (legitimate mail) to generate a giant word list, known as Bayesian filtering. Each word would be given a score, based on how frequently it appeared in spam vs. ham. The idea had two key points:
- it would learn about new spam words as they were introduced
- "good" words could offset "bad" words
So, how does it all work? Well, let's take that most popular of all spam words, Viagra. Your gossipy friend sends you a message all about herself, and it happens to include "I hear Joe started taking viagra!" A keyword-based spam filter will block any message that contains "viagra", so out it goes. A Bayesian filter would say, all these "I"s outweigh the the one "viagra", and let it through.
For a short while, Bayesian filters were all the rage, and very effective, because they were trained per user. Spammers never let a good plan get them down, though, and came up with a simple, ingenious solution: start sending random content. In the early days, it was snippets from great books (read David Copperfield one paragraph at a time!). They've since moved on to simple randomized phrases, and headlines like today's. All these red herrings have certainly degraded the accuracy of Bayesian filters, but like a good detective, spam filters try all the tools in their arsenal, hoping to find the one that closes the case.
------
Do you love sending email so much it hurts? See some simple stretches to relieve carpal tunnel syndrome pain.