A user forwarded me a particular annoying bit of spam the other day that I realised is going to be quite hard to combat.

  1. The email was sent from a Hotmail account. Clearly the spammers have
    broken the Hotmail CAPTCHA process (again), and thus are signing up
    10,000’s or more accounts to send their spam. The main issue is that
    it means there’s no easy “source IP” to test against RBLs for
    blocking or scoring purposes. Hotmail does add a “X-Originating-IP”
    header, but that’s non-standard and for the cases I’ve seen, the IPs
    are not on any known black lists.

    This actually seems quite an effective process for spammers. Using
    new spambot compromised machines to only send via reputable services
    like Hotmail, Yahoo, etc. Basically I believe most RBLs are built
    using systems that only check against the original incoming SMTP
    connection (either at the SMTP stage, or via some feedback process
    that later scans back through the Received headers). They generally
    don't look at custom headers like "X-Originating-IP". So even if
    spam checking software does check that header, not much RBL building
    software will, so as long as the spammer can keep those IPs so
    they're only used for sending via other "trusted" services, the IPs
    will probably stay off RBLs for a long time.

    Given the constant battle Hotmail, Yahoo, Gmail, etc have stopping
    mass signups, CAPTCHAs days seem numbered. Already in some cases,
    Google have started requiring SMS verification for new gmail
    I expect this trend to spread to other services and companies over
    time as the CAPTCHA systems employed to try and stop abuse appear to
    be less and less effective every day.

  2. The email contained a bunch of random text. Also not unusual, but it
    makes any content analysis basically impossible

  3. The email contained a link to a public Google Docs page. Again,
    clearly spammers have broken the Google CAPTCHA process to signup
    masses of Google Docs accounts and fill with their spam landing
    pages. Again this means that URIBLs are ineffective against these
    types of emails because they can’t go and block Google Docs domains.

The net result was that the emails in question contained very little information to block against. Some composite rules could be created (eg from a Hotmail account, with a Google Docs link in it), but they’re clearly far too broad and likely to result in many false positives.

At the moment, the main things we can do about this are:

  1. Report the emails as spam to providers like Spamcop and others. This
    should both end up reflecting badly on the services that are being
    abused, but should also encourage improvements to make sure they do
    look for X-Originating-IP headers and the like to help build IP RBLs
  2. Report the Google Docs pages as abuse. I’d hope Google have good
    internal systems to handle this, so that if a bunch of pages are
    reported as abuse, they can track down similar pages and disable
    them and the associated signups as well