One step forward, two steps back
It's been a really bad week for me. Backing out two significant pieces of work. One only released recently, but the other having caused problems for an entire year, and I'm really sorry to those who've been sitting through them as we didn't have the effort available to find the underlying cause.
First the recent one. We switched out DNS servers from tinydns to powerdns last week. There were very good reasons for the switch, tinydns as-is doesn't support IPv6, or DNSSEC, or zone transfers, or...
And the data file is built as a single giant database and synchronised to all our servers once per hour, so updates take some time to be made.
On the flip side - it's rock solid! It's served us well for years. So we put a lot of work into testing PowerDNS for the change. Unfortunately, it wasn't enough. First SOA records were broken for subdomains, then DNS delgation didn't work, and now that I've switched back, a problem with Chat server aggregation has gone away, so it was probably doing the wrong thing there too!
Anyway - powerdns got backed out. The "pipe" backend that we were using just isn't expressive enough, so we either need to find another way to do it, or find another path forward. The good thing about PowerDNS is that it's actively maintained, so we should be able to get somewhere here.
It's much sadder to give up EJabberd. Erlang is an interesting language, and the integration work was done by an intern last year - he did really good work. The hard bit was that we needed support for the many thousands of domains that our customers host with us. Ejabberd 2.x (the stable branch) just didn't support it. Ejabberd 3.x was going to, but was currently in alpha. Looking at the development pace, I made the call to integrate with Ejabberd. We did that.
But it's been plagued with problems. The chat logging service has been flaky, there have been "Malformed XML" disconnections which I suspect to be related to incorrect SSL renegotiations, but I haven't been able to prove it. I've spent far too much time looking at packet logs and trying to figure it out.
I've had long standing tickets about it, and kept saying "it's getting better" - but seriously, upstream hasn't made a single commit to ejabberd mainline since February this year. They're putting all their effort into the 2.x branch.
So I'm in the process of backporting our chat service to the DJabberd engine we used to use. It's not perfect either - it doesn't have anywhere near the feature set that ejabberd has, and it's not getting any more support. The code is of OK quality, but it's quite convoluted and written in many different styles which makes reading it tricky. I've had to make two patches to get interoperability up to scratch with modern servers and support the multiple SSL certificates we now use.
It's always sad to give up features, and to sideline hard work that you or others have done - but in the end we have been hurting customers by providing a sub-standard experience with chat. So I'm hoping to put a line under that by the end of this week and be able to move on with good new things again. At least a couple of us have some more Erlang experience now, and you never know when that might be useful. It's good just to understand different ways of thinking about code.