Tuesday, 27 October 2009

When commit appears to fail

So you're using explicit transactions. Everything appears to work (every query gives the expected result) until you get to COMMIT.

Then you get an exception thrown from COMMIT. What happened?

Usually this would be because the server has been shut down, or you've lost the connection.

The problem is, that you can't assume that the commit failed, but you also can't assume it succeeded.

A robust application must make NO ASSUMPTION about whether a failed commit did, indeed, commit the transaction or not. It can safely assume that either all or none of it was committed, but can't easily tell which.

So the only way to really know is to have your application somehow remember that the transaction MIGHT have failed, and check later.

Possible solutions:
  • Ignore it and deal with any inconsistencies manually, or decide that you don't care :)
  • Write your entire transaction such that if it is repeated having been committed previously, it is a no-op or has no harmful side effects (e.g. change INSERT to INSERT IGNORE to avoid unique index violations). This is rather difficult for complex transactions, but works for some.
  • Check, in your code, even if the commit apparently succeeded that it really did. If you discover that it didn't, then retry or report failure to the caller / user.
  • Perform another transaction to "undo" or "cancel" whatever the original transaction did if commit failed (Problem 1: this may itself fail to commit; problem 2: for a short time, an inconsistency exists)
None of these is ideal, and I'd like to think that this never happens. But if you do enough transactions, it's going to happen sooner or later (networks and servers do fail, unfortunately)

In testing, we've found that this happens so infrequently that even deliberate malice (kill -9 of mysql while loads of transactions are processing etc) fails to reproduce it reliably. In the end, a set of iptables rules which blocks the response from a commit was constructed.