Dec 17: The endless battle against spam
This is the seventeenth post in the FastMail 2015 Advent Calendar. Stay tuned for another post tomorrow.
For the last 20 years, spam has been a continuous issue for any email system administrator, and when they make it through any spam filtering system, ultimately a problem for email users as well. During that time, the nature and form of spam has changed considerably, and it's been a constant battle between the email system designers and the spammers trying to alternately block and evade each other respectively.
One of the things that makes email so great is that it's a distributed system where anyone can communicate with anyone else. Developed during earlier days on the Internet when the main users were universities and other government bodies, it was almost entirely a trust based system, where any machine could send to any other machine. In fact even more than that, any machine would happily relay email for you to any other destination! This openness has also been one of emails biggest problems, allowing unscrupulous people to also send large amounts of unsolicited email.
Spam fighting techniques
Techniques to block spam have evolved in many directions over time. Content filtering, content hashing, IP block lists, challenge/response, greylisting, standards conformance, statistical methods, domain reputation, etc.
For quite a few years, we've been using a combination approach, with standards conformance and IP block lists at the email receipt time (SMTP stage) to block known spamming servers/bots, and content filtering and statistical methods after a message has been received to classify its spamminess. Overall this has been quite effective, especially at dealing with one of the largest spam sources of the last 10 years, spamming botnets
Spammers evolve as well
It used to be that end users computers were the easy targets for building a botnet, using phishing or drive-by download techniques to get malware on users computers.
While botnets comprised of users home machines that have had malware installed on them are still popular, over the last few years they've become a lot less useful to spammers. Popular block lists like the Spamhaus Zen RBL automatically list most dialup/ADSL connections (these systems shouldn't be sending unauthenticated email directly to servers) as well as known compromised computers and have been very effective at stopping email from most botnets.
Recently there's been a large increase in two different types of spam sending machines; compromised servers, and legitimate servers at dodgy, or at least lax in their checks, hosting providers
Compromising servers used to be a much harder problem, but these days, lots of websites use common pieces of software like Wordpress, Joomla, Drupal, etc to run their site. In fact, at the time of writing, almost 25% of websites on the Internet are powered by Wordpress.
Lamentably, Wordpress and its plugins seem to have a long history of security issues. Though Wordpress isn't the only culprit, bugs in other popular software like Joomla, Magento and many more mean that compromising servers these days is likely easier than compromising end user machines, especially with all the work browser makers have put into hardening up their browsers against attacks.
In the simplest exploit cases, these servers are easy to spot because PHP adds a X-PHP-Script or X-PHP-Originating-Script header, which can show exactly which trojan .php file was uploaded to do the email sending. In many more cases though, the attackers have used the initial vulnerability to then install a more sophisticated trojan on to the machine. Because the server is in an IP block with other servers that might legitimately send email, or in fact the server they are on might legitimately already send email, then the existing good reputation of the IP address of the server is a useful way to get spam sent to other systems.
In the past, if spammers had a botnet or compromised machine, they would use it to quickly try and send as much spam as possible before the IP of the machine because tarnished and it was blocked. Again these days, they're much more careful about this and the term snowshoe spam was characterised to describe this, where the spammer would carefully spread and alter the source, content and other characteristics of the spam emails over time to make them harder to detect.
And so we arrive...
Recently we saw a rise in spam that was using a number of techniques that effectively worked around our existing spam blocking system. To deal with this required us to spend some time adding an additional domain reputation system to our anti-spam solution. Domain based reputation has particularly become more popular now that SPF and DKIM are more common. In the days before SPF and DKIM, if a sending server didn't have a reverse lookup hostname for their IP address, then it wasn't really possible to attach a domain to an email to base the reputation on, and IP reputation from a block list for the sending server was the only fallback.
In fact in the future, because of the huge IP range that IPv6 introduces, domain based reputation will probably be the core reputation method for email, which is why providers that support receiving email over IPv6 require valid Forward-confirmed reverse DNS and either valid SPF or DKIM on emails. Based on other discussions, it's likely that this will continue to be important for IPv6 email in the future.
Without going into extensive details, we've found the new system to be extremely effective at blocking the types of spam that were previously getting through, but again with an extremely low false positive rate. As always, we monitor our systems, and also reports from customers to make sure we've got the balance right.
One thing we discovered during this whole process is that it appears spammers are actively trying to work around domain based reputation systems already. An article posted a little while back describes a bit of detail behind Gmail's anti-spam filtering system. One of the interesting things about that system is how reputation can be transferred between entities, e.g. if an IP has a bad reputation, and it sends email with links to a particular domain, that domain can start picking up the bad reputation.
Conversely, if a particular IP or domain has good reputation, then it can probably transfer that good reputation to others. So an example of something we saw many times in our logs.
- connection from reputable email provider -> FastMail, email sent with @newspammerdomain.com address, DKIM signed with newspammerdomain.com domain, SPF valid for newspammerdomain.com
- connection a second or two later from another reputable email provider -> FastMail, again everything completely clean and valid
- then a stream of connections from other machines, all with valid Forward-confirmed reverse DNS of the form *.newspammerdomain.com, all sending emails with @newspammerdomain.com addresses, and again DKIM signed and valid SPF
The attempt here is obviously to get some of the good reputation of the reputable email provider to try and rub-off onto the newspammerdomain.com domain, so then when they use their hosted servers to send the bulk of the email, it gets through before users start reporting it as spam and burn the domain.
Fortunately, we're dealing with this sort of thing as well now :)
At its current point, we're very happy with how our spam filtering is performing, but you can never sit still for long. Spammers make money, so they're a determined lot, and will keep trying to find ways to get around filters and reputation systems. We'll continue to work on our systems to counteract, and have significant plans and improvements for next year.
For end users, the best thing you can do is use the "Report spam" button in the web interface to report as spam any emails that get through to your Inbox, this data helps us in many ways, but in an immediate short term way helps by training your personal statistical email filter.