Dec 5: Security - Integrity

Technical

This blog post is part of the FastMail 2014 Advent Calendar.

The previous post on December 4th was about how we build our mail servers. The next post on December 6th is about how we authenticate users.

Technical level: medium

On Tuesday I started this series of posts on security with an overview of the elements of security: Confidentiality, Integrity and Availability.

Integrity is, in my opinion, the most important part of security when it comes to email, so I'm starting there.

I believe that email is your electronic memory. I spoke about this at Oslo University back in 2011, where I answered the question "is email dead" with the following points:

  • Compatibility
  • Unchangeable
  • Privacy
  • Business / Orders / Receipts

Email is built on standards. It's the world's most interoperable network.

Once you get an email, it's your own immutable copy. Your own private immutable copy. It can't be retracted or edited, all the sender can do is send you a new email asking you to disregard the last one. You never get a situation where you remember seeing something, but it doesn't exist any more. Unless they link out to a website, then the content can disappear far too easily. Forget about diamonds, email is forever.

At that talk, I addressed the idea that social networks with their private-garden messaging systems would replace email. I use social networks - I organise to catch up with friends via Facebook. I even use it to find people to cover classes (my other job: teaching gym classes). But I wouldn't use social networks for business receipts, or orders, or something I wanted to remember forever. Email is still the gold standard here (unless you have a fax machine).

Recently, I was trying to find an old conversation that a friend and I had via Facebook messages. We have a few from 2007, and then a gap through until 2010. Nothing from the years in between. That whole conversation is lost, because we didn't copy it anywhere else, and Facebook didn't keep it.

My email memory goes back to a little before I moved everything to FastMail - because I messed up and lost everything with a stupid mistake in 2002. I don't expect to ever "forget" anything I've received since then.

Email is, by the design of both the POP3 and IMAP protocols, immutable at the message level. You are not allowed to change the contents of a message once it's been seen by a client.

In the Cyrus mail server, we take this a step further, by storing the sha1 digest of every raw email message in the index file, and hence being able to detect any accidental corruption or malicious modification of the file on disk.

Integrity at FastMail

This is where we really shine. We're fanatical about data integrity. We only blog about the cases where things go wrong, which you can read about in examples like:

Even with the nasty bug in 2011, a single misplaced comma which caused all our disks to fill up super-fast and required a full index reconstruct, we didn't lose anyone's email, because we had enough sanity checks in place. It just took a while to rebuild indexes. We don't corrupt indexes on full disk any more. The 2014 bug we lost a handful of emails for 70 users because of a further bug in handling emails which were expunged at one end of a replica pair and not at the other. That bug is now fixed as well..

If you look at the change history on the Cyrus IMAPd server leading up to the 2.4 release, and even earlier as well, you'll see us adding integrity checks at every level of the Cyrus data structures. You'll also see patches to the replication system as we, over months, tracked down every case where it was incomplete or incorrect, and fixed them until our replicas were perfect. Our "checkreplication" script still runs weekly over all users looking for mismatches.

And then for 2.4 we rewrote Cyrus replication completely, to be more efficient so we could have replicas in another country - and to make the replicas also do integrity checks on the data coming over the wire, so you can never replicate a broken mailbox and break the other copy as well.

This is why we replicate at the application level rather than at the filesystem level using something like DRBD - because at the application level we have enough information to ensure that only consistent mailboxes are replicated.

Our backup system also does separate checks, using the same index record, but a completely different piece of code (written in Perl rather than C).

This wasn't built in response to the idea that some attacker would come in and subtly change your emails (though it does provide some very strong protections against those attacks), it was written in response to risks like the faulty RAM we saw in early 2014, or bugs in the kernel silently corrupting files.

Our backup system is available to all our customers, self serve. Just click on a button and a restore will be run for you in the background. So even if you delete email by mistake, you have at least a week to get it back.

Interestingly, some choices reduce integrity in exchange for other things, for example storing all email on encrypted filesystems is a risk to data integrity - it's much more likely that we will lose everything on a filesystem in face of a partial failure. Data recovery is less possible - so we're relying more heavily on replicas and backups. The tradeoff here is that if we discard a failed disk, or one of our servers is accidentally sold off on ebay (hey, it's happened) with user data still on it, then it won't be readable. It's a confidentiality vs integrity tradeoff that we are comfortable making.

Integrity and Hosting Jurisdictions

There's not a huge risk of a government sponsored or other well funded attacker trying to modify your email on our servers. The chance of detection by our regular integrity checking systems is very high (and we can tell the difference between an email with 4096 bytes of garbage where a block was corrupted on disk and one with subtly changed wording), and the benefits are low.

As for the accidental corruption that we do sometimes see - it's going to come down to dirty vs clean power, and environmental conditions. Temperature fluctuations, humidity, vibration - these are all risks to data integrity, and they are more about a specific datacentre than choice of country. We recently moved our servers to new racks in a cold-containment-aisle area inside our NYI datacentre, which will give consistent cooling up the full height of the rack. All servers have dual power supplies, on two separate circuits, and the power is well filtered by the time it reaches us.

Integrity and The Future

There is one more thing that I want to add to Cyrus to improve integrity even further. At the moment it is possible to fully delete a mailbox or a user on a Cyrus server, and have that delete replicate immediately. In future, I will make it so that it is not possible, even with a single Cyrus server compromised, to permanently delete anything from its replicas. Removing a user will have to be done explicitly on each copy.

I also want to extend the backup system to be something "standard", at least within the Cyrus world, and open source for everybody. For now it's quite specific to our systems. A standard interchange format for mailbox archives would make life better for everyone. I have some draft notes from a meeting with David Carter at Cambridge (the original author of the Cyrus replication code), but haven't finished it yet.

And finally, I want to back up everything else about a user, to the point where it has the same integrity guarantees as the email. Often if someone has deleted their entire account by mistake or let it lapse, we can recover the email - but some of the database-backed items are lost forever.

This will also allow mothballing accounts for cases like a poor fellow I answered a support request for recently. His father had Alzheimer's Disease and forgot to renew his FastMail account. By the time the son realised that the account had been closed, all email history had been cleaned off our servers. By keeping full backups for a much longer time in the case of payment lapses rather than deliberate account closure, we could save people from losing email in these cases.

As I said at the start, your email is your memory. We take our job of keeping that memory intact very seriously.

aside: for those concerned about sha1 collision attacks, not only is
there no known sha1 collision at all yet, it's very hard to cause a
collision by sending an email, because many headers are added between
SMTP delivery and final injection into the mailbox, and they are hard
to predict. Not impossible, which is why we're working on a series of
patches to include a random string in the Received header added by
Cyrus.

Finally, it's possible to change the hash algorithm with a simple
index upgrade, checking the old one against sha1 and then calculating
a new hash. We've already done this once from md5 to sha1.