More greylisting details

Historical

I thought I'd take some of the things posted in the forum threads here and here and put them into one blog post.

There were actualy two separate policies implemented at the same time: greylisting and "address enumeration detection" (We'll call it AED). Greylisting is a method designed to stop spam being accepted from the large number of zombie computers that are connected to the internet. AED is designed to stop other people trying to find what addresses are valid email addresses at FastMail.

One of the main concerns with greylisting is that naive implementations will often delay all email. In our implementation we've gone to great lengths to ensure that this doesn't happen.

  1. We only greylisting hosts that appear to be dialup/dsl hosts of some
    sort, or hosts that don't have any valid reverse DNS. This ensures
    that the vast majority of email servers are immediately not subject
    to greylisting, and their email is not delayed
  2. If a host has been greylisted, and it successfully passes
    greylisting twice in a 24 hours period (e.g. it correctly attempts
    to re-deliver a piece of email twice in 24 hours), then that host is
    whitelisted and not subject to greylisting for the next 24 hours. If
    it continues to deliver emails, each new delivery will extend the
    whitelist period. This means that any real email servers (a real
    email server will always retry) connected via dialup/dsl will
    quickly be whitelisted and not subject to email delays
  3. If a host opens an SMTP session with a HELO that is not an IP
    address, is not the same reverse DNS as it's connecting IP, but the
    forward DNS of the name does resolve to the connecting IP, then that
    host is not subject to greylisting. (As suggested by hadaso on the
    forum)An example: The machine at IP 206.223.169.73 connects to us.
    The reverse DNS for 206.223.169.73 is 206-223-169-73.beanfield.net,
    which looks like a common dialup/dsl IP name, and would be a
    candidate for greylisting. However, the machine advertises itself to
    us with a "HELO mx3.hub.org" line. Doing a forward lookup of
    mx3.hub.org gives the IP 206.223.169.73, which is the same as the
    connecting IP, so we exclude it from greylisting.

When combined, these features provide an excellent balance of greylisting hosts which should not be sending email, and allowing those hosts which should be sending email to get their email straight through.

Additionally, to help with the tracking of any problems, once a message passes greylisting and is accepted, a new header is added "X-Spam-greylist". This header tells you how many seconds the email was delayed and whether that host has been whitelisted for 24 hours. (Technical: Well, actually the delay figure is the how long the last delay for the ip/sender/recipient combination was, so in the case of multiple emails from the same person, to the same person, from the same machine in a short time period, the figure will be a bit messy and hard to calculate).

All up, we now have 4 lines of defense against spam at the moment:

  1. RBLs (dsbl/xbl) - all users
  2. Greylisting - all users
  3. SpamAssassin -
    full/enhanced users
  4. Backscatter
    detection
    -
    full/enhanced users

The combination of these 4 things provides an extremely strong defense against spam with absolutely no user interaction at this point. We also hope later this year to add per-user bayes databases, which will allow per-user training of a statistical database to catch the final spams that make it through all these filters.