MySQL and an atomic 'check ... on none insert'

Technical

This is the ninth post in the 2017 FastMail Advent Calendar. The previous post was an insight into our activities at IETF. The next post is an Interview with David our Marketing Manager.

Stay tuned tomorrow for the next update.


This post could have been titled:

"Why SELECT ... FOR UPDATE doesn't work on non-existent rows like you might think it does".

FastMail has been a long term user of the MySQL database server, and in particular the InnoDB storage engine. InnoDB provides many core database features (ACID, transactions, row-level locking, etc).

One thing that comes up from time to time is the desire to do the following sequence atomically:

  1. Check if a row with a particular id exists in a table
  2. If yes, fetch the row data
  3. If not, perform an expensive calculation (e.g. external web request) to get the data and insert it into the table

The important point is you want to do this atomically. That is: if one process checks for the row, but another process is already doing the "expensive calculation", it blocks until that process completes and has inserted the row into the table.

So if you have a table:

CREATE TABLE foo (
    Id INT NOT NULL PRIMARY KEY,
    Data BLOB
);

The primary key ensures Id is unique.

One thought would be that you could do a SELECT and then INSERT IGNORE if the row doesn't exist. This doesn't return an error if the INSERT would result in a duplicate key, but does mean that the "expensive calculation" might be done multiple times.

After looking at some MySQL documentation, you might think a transaction with a SELECT ... FOR UPDATE statement is what you want. This would check for the data row, but lock the row gap if it doesn't exist. Excellent!

Unfortunately this doesn't work.

Quoting directly from an answer by Heikki Tuuri (the original developer of InnoDB)

http://forums.mysql.com/read.php?22,10506,11347#msg-11347

READ COMMITTED and REPEATABLE READ only affect plain SELECTs, not
SELECT ... FOR UPDATE.

In the 4.1.8 execution:

1.mysql> select * from T where C = 42 for update;

2.mysql> select * from T where C = 42 for update;

-- The above select doesn't block now.

1.mysql> insert into T set C = 42;

-- The above insert blocks.

2.mysql> insert into T set C = 42;

ERROR 1213: Deadlock found when trying to get lock; Try restarting
transaction

the explanation is that the X lock on the 'gap' set in SELECT ... FOR
UPDATE is purely 'inhibitive'. It blocks inserts by other users to the
gap. But it does not give the holder of the lock a permission to insert.

Why the inhibitiveness: if we have three index records:

aab <first gap> aba <second gap> acc

there are two gaps there. Suppose user 1 locks the first gap, and user 2
locks the second gap.

But if 'aba' is delete-marked, purge can remove it, and these two gaps
merge. Then BOTH user 1 and user 2 have an exclusive lock on the same
gap. This explains why a lock on gap does not give a user a permission
to insert. There must not be locks by OTHER users on the gap, only then
is the insert allowed.

Best regards,

Heikki
Oracle Corp./Innobase Oy

Although this post is over 12 years old, it appears to be still relevant to InnoDB today, as any quick testing will show.

In fact in some common scenarios, it's even worse than you might expect. Say you have one table that has an auto increment primary key. You then use that created id to calculate and insert "expensive to calculate" data into another table with the same id. The first FOR UPDATE locks the gap for any id's after it as well.

Say table T has no C's > 41.

t1> select * from T where C = 42 for update;
t2> select * from T where C = 43 for update;
t1> mysql> insert into T set C = 42;
-- The above insert blocks.
t2> insert into T set C = 43;
ERROR 1213: Deadlock found when trying to get lock; Try restarting
transaction

There's some more commentary online about this issue:

The options appear to be:

  • You have to use a mutex table that you know has the row with the Id to lock on in it and SELECT ... FOR UPDATE on that table
  • Use a lock via the mysql GET_LOCK/RELEASE_LOCK functions. If you include the primary key value(s) in the lock name, you can avoid having a global lock on the whole table, the lock names act effectively as row locks. Beware that MySQL earlier than 5.7.5 limits you to one lock per session
  • Deal with the race and do an ordinary SELECT + INSERT IGNORE and hope that most of the time it's not a problem and that the wasted computation is rare.
  • Use some completely separate global locking system

Sometimes this is rather frustrating, but there don't appear to be any other good options.

Why FastMail uses MySQL InnoDB

For the curious, when we originally started developing FastMail in 2000 we started with PostgreSQL. However at the time PostgreSQL had a limit of 8kb of data per-row. Because (again, at the time) we stored emails in the database, this was not viable and we switched to MySQL InnoDB. As these things are, it was only a short while later that we switched to IMAP and Cyrus for email storage.

Sometimes the state of a product is the sum of a series of seemingly inconsequential decisions at the time.