Dec 16: ImapClone - Invisible Migration

Technical

This is the sixteenth post in the FastMail 2015 Advent Calendar. Stay tuned for another post tomorrow.


We will definitely write more about the Pobox and Listbox acquisition and how we're going with integrating the two companies more closely.

Our initial plan was to replicate all Pobox Mailstore users to the FastMail backend servers and announce the aquisition at the same time as announcing the availability of the FastMail interface to Mailstore users.

Unfortunately, I let the cat out of the bag early with a mistake in the provisioning the first random set of accounts on FastMail hardware, which sent emails to hundreds of Pobox Mailstore users welcoming them to FastMail. So we had to announce what was going on to clear up any confusion.

Migrating data transparently

Many years ago, when I started at FastMail, I was running my own mail server for my family. Eventually I realised this was a dumb idea, and migrated them to FastMail. I wanted to do it without my parents having to download all their email again, since they lived in a remote area with (at the time) an ISDN link to the world, 64kbit/sec.

Since I was a backend Cyrus hacker, of course my thoughts went directly to the Cyrus replication protocol. I wrote a one-shot tool to bring data from the Courier IMAP backend and inject it into Cyrus so that UIDs and UIDVALIDITY would be the same on every mailbox.

Identical mailboxes

To explain some more, IMAP was designed to allow reliable and somewhat efficient synchronisation between multiple clients. Because a mailbox can be deleted and recreated, there needed to be a way to know that the mailbox wasn't the same, and the cached data needed to be replaced. It's all explained in the RFC.

There is no way via IMAP to request a particular UIDVALIDITY when you create a mailbox, and no way to request a particular UID when you append a message, which is why tools like Offline IMAP and imapsync work, but the client needs to be reconfigured afterwards.

This is why I made my tool use the replication protocol. I used it, and mothballed the code. 10 years later, the replication protocol in Cyrus had been completely rewritten, so it was stale - but surprisingly some of it was still useful!

Namespaces and separators

As with any specification developed over time by many people, trying to be all things to everyone, IMAP is very flexible. Mailboxes can be in different namespaces, and even have a different hierarchy separator between the folders. It's madness. Cyrus supports two different namespaces, the original one where everything was a subfolder of INBOX, and "altnamespace", which is the popular one these days, where INBOX sits alongside your other folders.

It also supports two separators, '.' or '/'. Again, '.' (netnews style) was the popular one when FastMail started, and '/' is the popular one now. You can force use of the altnamespace by using port 992 on FastMail right now.

Thankfully, Pobox is using the same settings as FastMail, because they were founded at a similar time. One fewer thing to worry about. The only confounding issue was pop3, which has its own method for calculating uniqueness called UIDL. Cyrus already supports two methods, but Pobox was using the Dovecot default, so I added support for Dovecot style UIDL to Cyrus.

Talking the new Cyrus sync protocol

Unfortunately the protocol is somewhat under-documented. There's a little about DList and that's about it. Even the "replication_examples" file in that directory is for the old protocol. Thankfully I wrote the protocol, so I know my way around it.

The code is all in Perl and you can have a look at it in my github repo.

Syncing Pobox users

This part is still ongoing. It has taken longer then planned because we got sidetracked by the DDoS attacks, and there were some other problems found along the way.

After initial success with a handful of Pobox staff, we now have a few hundred randomly chosen Pobox users backed on the FastMail IMAP servers rather than the Pobox servers. We're using an encrypted link between our two datacentres - they are only 5ms apart on the network, so it's not noticeable latency. As far as the Pobox infrastructure is concerned, it's just another "shard" in their parlance - the entire FastMail system with its internal failover and multiple stores presents a single endpoint of "unlimited" capacity to Pobox.

Pobox users are stored by ID on the backend, FastMail users are stored by username - there are pros and cons to both approaches - we're using username because it looks nicer when you have cross-user sharing. The downside is that some Pobox primary aliases were invalid usernames in Cyrus, so we're dealing with those as we come across them.

Valid letters in folder names

We've had to add '[]', '()', and '?' to the list of valid characters in folder names at FastMail to allow some users to sync. We're still blocked on folders with '^' and '!', because Cyrus uses these internally in folder name storage, and we may be forced to ask users to rename folders. I'm hoping I can find a nicer solution.

Since IMAP was written, everyone has learned - not only to use UTF-8 everywhere, but to refer to items by a unique identifier which doesn't change, and to store the display name as a non-primary key. Our new JMAP protocol doesn't have these problems - but for IMAP, there's a shift into modified UTF-7, plus some complexities around which servers support which special characters in mailbox names.

Valid characters in messages

This one hit us just yesterday. Cyrus rejects messages with NULL bytes in them, even if they're 8 bit. It can rewrite binary on the way in, but not store parts as binary. Dovecot could, and so we have some messages that were synced in as binary - and the replication protocol allowed it.

I see making Cyrus 8-bit clean in my near future. It's a worthy cause anyway, and the internals are getting to the state that it's not impossible.

The biggest problem is that one user had a bunch of messages with attachments with Content-Type: message/rfc822, yet actual content was a JPEG image. This looks like it's entirely down to a client bug (great work K9), and it confused both Dovecot and Cyrus, but in different ways. It confused Cyrus so hard that I had to change how message file size was calculated and stored (the diff looks small, but it took me a day to do this, and I still rolled out a broken version overnight, where it didn't handle binary append correctly. The version you see here is the prettied up rebased version)

Stage 2

Once all Pobox users are mastered on the FastMail servers, we can build conversations databases, search indexes and all those good things. We can migrate CalDAV and CardDAV over as well, and take advantage of those integrations. It will be great for Pobox Mailstore users to have access to all the features that FastMail has built.

There are still lots of unsolved issues - but at least we could do this initial migration without users needing to make any changes to their workflow of configuration.

Improving ImapClone

The Cyrus::ImapClone module as written and used for this initial migration is deliberately only one-way, and will bail out if anything is changed at the destination. This is by design, so no mistake at the FastMail end could damage accounts at Pobox.

Stage 2 of ImapClone will be making it able to work both ways. I've aready done a bunch of the work as part of the JMAP Proxy work this year. With a good ImapClone, you could do Cyrus-style replication with Cyrus at one end only, keeping everything in sync with a remote IMAP server.