This blog post is part of the FastMail 2014 Advent Calendar.
Technical level: medium/high
Today we talk about the internals of our authentication system, which is how we decide that you are who you say you are and from there, figure out what you're allowed to do.
On every server we have a service running called "saslperld". As with many of our internal services, its name is now only loosely related to its function. saslperld is our authentication service, and exists to answer the question "can this user, with this password, access this service?".
Making an authentication request
Each piece of server software we have has some way of calling out to an external service for authentication information. Each server tends to implement its own protocol for doing this, so saslperld implements a range of different methods to receive authentication questions.
The simplest of these is the original saslauthd interface, used by Cyrus. It has three fields - username, password, and service name, and returns a simple yes/no answer. These days it's barely used, and only really in internal services, because it can't really be extended so we can't pass in other interesting information about the authentication attempt (which I'll talk about further down).
The real workhorse is the HTTP interface, used by nginx. Briefly, nginx is our frontend server through which all user web and mail connections go. It takes care of authentication before anything ever touches a backend server. Since it's fundamentally a system for web requests, its callouts are all done over HTTP. That's useful, because it means we can demonstrate an authentication handshake with simple command-line tools.
Here's a successful authentication exchange to saslperld, using curl:
$ curl -i -H 'Auth-User: firstname.lastname@example.org' -H 'Auth-Pass: ********' -H 'Auth-Protocol: imap' http://localhost:7777 HTTP/1.0 200 OK Auth-Pass: ******** Auth-Port: 2143 Auth-Server: 10.202.80.1 Auth-Status: OK Auth-User: email@example.com
The response here is interesting. Because we can use any HTTP headers we like in the request and the response, we can return other useful information. In this case, `Auth-Status: OK` is the "yes, the user may login" response. The other stuff helps nginx decide where to proxy the user's connection to. Auth-Server and Auth-Port are the location of the backend Cyrus server where my mail is currently stored. In a failover situation, these can be different. Auth-User and Auth-Pass are a username and password that will work for the login to the backend server. Auth-User will usually, but not always, be the same as the username that was logged in with (it can be different on a couple of domains that allow login with aliases). Armed with this information, nginx can set up the connection and then let the mail flow.
The failure case has a much simpler result:
HTTP/1.0 200 OK Auth-Status: Incorrect username or password. Auth-Wait: 3
Any status that isn't "OK" is an error string to return to the user (if possible, not all protocols have a way to report this). Auth-Wait is a number of seconds to pause the connection before returning a result. That's a throttling mechanism to help protect against password brute-forcing attacks.
If the user's backend is currently down, we return:
HTTP/1.0 200 OK Auth-Status: WAIT Auth-Wait: 1
This tells nginx to wait one second ("Auth-Wait: 1") and the retry the authentication. This allows the blocking saslperld daemon to answer other requests, while not returning a response to the user. This is what we use when doing a controlled failover between backend servers, so there is no user-visible downtime even though we shut down the first backend and then force replication to complete before allowing access to the other backend.
This is a simple example. In reality we pass some additional headers in, including the remote IP address, whether the conneciton is on an SSL port or not, and so on. This information contributes to the authentication result. For example, if we've blocked a particular IP, then we will always return "auth failed", even if the user could have logged in otherwise. There's a lot of flexibility in this. We also do some rate tracking and limiting based on the IP address, to protect against misbehaving clients and other things. This is all handled by another service called "ratetrack" (finally, something named correctly!) which all saslperlds communicate with. We won't talk about that any more today.
There's a couple of other methods available to make an authentication request, but they're quite specialised and not as powerful as the HTTP method. We won't talk about those because they're really not that interesting.
Checking the password
Once saslperld has received an authentication request, it first has to make sure that the correct password has been provided for the given username. That should be simple, but we have our alternate login system that can make this quite involved.
The first test is the simplest - make sure the user exists! If it doesn't, obviously authentication fails.
Next, we check the provided password against the user's "master" password. There's nothing unusual here, it's just a regular password compare (we use the bcrypt function for our passwords). If it succeeds, which it does for most users that only have a single master password set, then the authencation succeeds.
If it fails, we look at the alternate logins the user has configured. For each one available, we do the appropriate work for its type. For most of these the provided password is a base password plus some two-factor token. We check the base password, and then perform the appropriate check against the token, for example a Yubikey server check, or comparing against the list of generated one-time-passwords, or so on. The SMS 1-hour token is particularly interesting - we add a code to our database, SMS the code to the user, and then fail the authentication. When the user then uses the code, we do a complete one-time-password check.
At this point if any of the user's authentication methods have succeeded, we can move on to the next step. Otherwise, authentication has failed, and we report this back to the requesting service and it does whatever it does to report that to the user.
Authorising the user
At this point we've verified that the user is who they say they are. Now we to find out if they're allowed to have access to the service they asked for.
First, we do some basic sanity checking on the request. For example, if you've tried to do an IMAP login to something other than mail.messagingengine.com, or you try to do a non-SSL login to something that isn't on the "insecure" service, then we'll refuse the login with an appropriate error. These don't tend to happen very often now that we have separate service IPs for most things, but the code is still there.
Next, we check if the user is allowed to login to the given service. Each user has a set of flags indicating which services they're allowed to login to. We can set these flags on a case-by-case basis, usually in response to some support issue. If they user is not explicitly blocked in this way, we then check their service level to see if the requested service is allowed at that service level. A great example here is CalDAV, which is not available to Lite accounts. An attempt to by a Lite user to do a CalDAV login will fail at this point. Finally, we make sure that the service is allowed according to the login type. This is how "restricted" logins are implemented - there's a list of "allowed" services for restricted accounts, and the requested service has to be in that list.
(It's here that we also send the "you're not allowed to use that service" email).
Once we've confirmed that the user is allowed to access the requested service we do a rate limit check as mentioned above. If that passes, the user is allowed in, and we return success to the requesting service.
To make things fast, we cache user information and results quite aggressively, so we can decide what to do very quickly. The downside of this is that if you change your password, add an alternate login method or upgrade your account all the saslperlds need to know this and refresh their knowledge of you. This is done by the service that made the change (usually a web server or some internal tool) by sending a network broadcast with the username in it. saslperld is listening for this broadcast and drops its cache when it receives it.
There's not much else to say. It's a single piece of our infrastructure that does a single task very well, with a lot of flexibility and power built in. We have a few other services in our infrastructure that could be described similarly. We'll be writing more about some of those this month.