Sunday, 13 April 2008

Improve reliability and maintainability by using stateless connections

MySQL application developers have some problems with connection state. There is a lot of state which can be used associated with a single connection
  • Transactions (autocommit, uncommitted work, when a transaction was started, isolation level)
  • Temporary tables
  • Session variables which affect behaviour
  • Other session variables
If you can avoid all of these, then you can benefit from transparent auto-reconnection and transparent connection reuse / pooling etc.

If you can't, then you will have to deal with reconnection, retry at a higher level. This is complicated and difficult to test.

I'll discuss ways of avoiding these things one at a time:


I'm not advocating shunning transactions completely, just avoiding keeping any transaction state in the connection:
  • Keep autocommit on. Execute everything you want in a transaction in a single statement (in MySQL it is possible to delete from several tables in one statement, insert several rows etc).
  • Transactional engines will still use transactions if autocommit is on - a single statement becomes a single transaction - you will see a consistent view within that statement - no other thread will see parts of a batch insert, update or delete.
  • Non-transactional engines behave the same anyway.
  • Alternative: do everything that needs to be in a transaction inside a stored procedure - if these are retried, they will be retried as a whole.
Temporary tables

These are best avoided. They also cause problems for statement-level replication.
  • Use short-lived permanent tables with unique names instead (perhaps in a separate database which is automatically cleaned up when tables get old)
  • Wrap things which need temporary tables inside stored procedures
  • Use derived tables instead - in some cases this is sufficient.
Session variables which affect behaviour

Sometimes you can't get away from having to set these variables. Common examples are:
  • SET NAMES utf8
If an autoreconnect happens while these are in effect, you could accidentally get rubbish data inserted into the database if a connection reverts to latin1 unexpectedly or SQL_MODE reverts to its default "anything goes".

As these are unavoidable, they always have to be set - but there is a solution - the MYSQL_INIT_COMMAND option. As these commands won't actually have any effect on their own, we can have them automatically re-done when the library reconnects to the database.

The only problem with this approach is that not all client library interfaces allow the setting of this option - but in some cases it can be set in my.cnf (for client libraries which read this)

Other session variables

Quite simply - don't use them. There is no reason I know of for having another session variable.
  • Don't select @@LAST_INSERT_ID or last_insert_id() - use your API method to get it instead. An auto-reconnect happening between the insert and the select @@LAST_INSERT_ID will cause it to return zero, which is probably not what you wanted or expected.
  • Don't define your own variables - keep them on the client instead (or if you absolutely must store state server-side, put it in a table!)
  • It is possible to enable automatic reconnect for greater resilience at the MySQL API level
  • To do so safely you must avoid keeping session state in the connection
  • Variables which need to be reset on connect can be reset with mysql_init_command


Artem Russakovskii said...

"in MySQL it is possible to delete from several tables in one statement, insert several rows etc" - How about updating several rows in one statement (I'm talking about actual unique data, of course, not just a column deletion or something)? Is/will that be possible?

Mark Robson said...

Using standard SQL it is possible to update multiple rows with one statement, just by having an UPDATE with a WHERE clause that matches more than one row.

I was only using this as an example of how transactions apply even with autocommit enabled.

Sheeri K. Cabral said...

Don't select @@LAST_INSERT_ID or last_insert_id() - use your API method to get it instead.

Or, use natural primary keys or other unique keys instead.

Artem Russakovskii said...

Mark, sure, I understand. As far as updates, what I meant when I said 'unique data' is updating a set of values to something that cannot be described by a single WHERE clause (a set of youtube view counts, for example).

lsmith said...

Well in theory something like mysql_change_user() should be a reliable way to clean up your connections: