Moving Pobox to New York

Company

This is the sixteenth post in the 2016 FastMail Advent Calendar. Stay tuned for another post tomorrow.


Back in November 2015 we announced that FastMail had acquired the Pobox and Listbox services. Whilst these services continue to stand on their own it made sense to consolidate the infrastructure they run on with FastMail's own.

When I joined FastMail in March work had begun on moving these services from their home in Quonix, Philadelphia to FastMail's primary datacentre at New York Internet, New York. As a gentle introduction to the company it became solely my job to finish this migration - lucky me!

Lift and shift?

Perhaps the simplest way to move everything would be to turn off the computers, pile them into a truck, drive up to New York City and plug them all back in. However this would leave customers less than happy about the many hours of downtime and adds a huge risk of something going astray with all that data and hardware en route. (Of course, a station wagon of data has impressive bandwidth.)

Service continuity

The right way to do this to install some equipment in the new datacentre and migrate to it in a way that kept services running without any downtime or outages.

We commissioned a new rack next door to FastMail's others and added some servers from FastMail's own New York inventory alongside some that were sent ahead from Philadelphia. These first arrivals were shipped after their duties were taken over by servers remaining at Philadelphia.

This required careful shuffling of systems to ensure we kept enough equipment running at both locations to maintain redundancy and fault tolerance during the move. This cautious incremental approach guided the sequence of the migration project throughout.

Multiple datacentres

We also needed to tackle some of the usual multi-datacentre challenges, such as efficient database replication (we used HAProxy to steer MySQL reads to their local datacentre's replica and minimised cross-datacentre writes) and running a private network between the two sites (we used OpenVPN to connect our firewalls).

New technology

The new stack in New York wasn't an exact replica of the original Philadelphia one. We used the opportunity to move from a collection of standalone SmartOS hypervisors to one running Triton Datacenter, Joyent's cluster platform that builds on top of SmartOS to provide a fully managed fleet of servers.

From the point of view of the individual virtual machines that comprise the Pobox and Listbox services this change was invisible (they remain zones managed by Chef). However, moving them into Triton affords us better manangement and provisioning capabilities.

Broken technology

Damaged server

Unfortunately one shipment of servers from Philadelphia to New York was badly damaged in transit. So bad were the injuries that they would not even turn on! We began a long and involved insurance dance which was eventually successful. In the meantime we took the plunge and ordered replacement servers immediately so this disaster did not stall the migration.

Outages

We had one scheduled outage - a brief pause in the web services customers use to administer their accounts whilst we switched the primary database between datacentres - and one unscheduled one:

A upstream routing problem at one datacentre made it unreachable. Ordinarily this is fine since the services at the second location can continue to operate with a known level of degradation. However, the timing was unfortunate since it occured during the move of our DNS servers and exposed a problem with our nameserver glue on some domains.

Looking at the incident report I wrote the result was 36 minutes of intermittent availability of the services. Given the project's duration and scope this isn't too bad ("four nines" ).

Conclusion

All the frontend services were wholly moved to New York by September. Some of the more special snowflake services used by the back-office systems took longer (and now they are no longer snowflakes).

The very last service in Philadelphia, one of the distributed logging servers, was shut down last week, mid December. The remaining hardware will be soon turned off and shipped to New York as spares.

Altogether we moved around 150 virtual machines that now live in 13 physical servers. The first number reflects the distributed architecture of both Pobox and Listbox

On a personal note, moving an unknown suite of products, services and systems from one datacentre to another is a fantastic way to get really acquainted with them!

My new colleagues were patient with my questions, supportive of additional work I needed them to do along the way and helped me out of some blindspots. I'd like to thank Bryan, IC Group sysadmin alumnus, in particular for laying the groundwork of the migration before I started. My first nine months at FastMail have been great (or, in the colloquial tongue of my new adopted country, "not bad").