Mark's stream of verbiage: Improve reliability and maintainability by using stateless connections

MySQL application developers have some problems with connection state. There is a lot of state which can be used associated with a single connection

Transactions (autocommit, uncommitted work, when a transaction was started, isolation level)
Temporary tables
Session variables which affect behaviour
Other session variables

If you can avoid all of these, then you can benefit from transparent auto-reconnection and transparent connection reuse / pooling etc.

If you can't, then you will have to deal with reconnection, retry at a higher level. This is complicated and difficult to test.

I'll discuss ways of avoiding these things one at a time:

Transactions

I'm not advocating shunning transactions completely, just avoiding keeping any transaction state in the connection:

Keep autocommit on. Execute everything you want in a transaction in a single statement (in MySQL it is possible to delete from several tables in one statement, insert several rows etc).
Transactional engines will still use transactions if autocommit is on - a single statement becomes a single transaction - you will see a consistent view within that statement - no other thread will see parts of a batch insert, update or delete.
Non-transactional engines behave the same anyway.
Alternative: do everything that needs to be in a transaction inside a stored procedure - if these are retried, they will be retried as a whole.

Temporary tables

These are best avoided. They also cause problems for statement-level replication.

Use short-lived permanent tables with unique names instead (perhaps in a separate database which is automatically cleaned up when tables get old)
Wrap things which need temporary tables inside stored procedures
Use derived tables instead - in some cases this is sufficient.

Session variables which affect behaviour

Sometimes you can't get away from having to set these variables. Common examples are:

SET NAMES utf8
SET SQL_MODE = 'TRADITIONAL'

If an autoreconnect happens while these are in effect, you could accidentally get rubbish data inserted into the database if a connection reverts to latin1 unexpectedly or SQL_MODE reverts to its default "anything goes".

As these are unavoidable, they always have to be set - but there is a solution - the MYSQL_INIT_COMMAND option. As these commands won't actually have any effect on their own, we can have them automatically re-done when the library reconnects to the database.

The only problem with this approach is that not all client library interfaces allow the setting of this option - but in some cases it can be set in my.cnf (for client libraries which read this)

Other session variables

Quite simply - don't use them. There is no reason I know of for having another session variable.

Don't select @@LAST_INSERT_ID or last_insert_id() - use your API method to get it instead. An auto-reconnect happening between the insert and the select @@LAST_INSERT_ID will cause it to return zero, which is probably not what you wanted or expected.
Don't define your own variables - keep them on the client instead (or if you absolutely must store state server-side, put it in a table!)

Conclusions

It is possible to enable automatic reconnect for greater resilience at the MySQL API level
To do so safely you must avoid keeping session state in the connection
Variables which need to be reset on connect can be reset with mysql_init_command

5 comments:

Artem Russakovskii said...: "in MySQL it is possible to delete from several tables in one statement, insert several rows etc" - How about updating several rows in one statement (I'm talking about actual unique data, of course, not just a column deletion or something)? Is/will that be possible?; 14 April 2008 at 06:32
Mark Robson said...: Using standard SQL it is possible to update multiple rows with one statement, just by having an UPDATE with a WHERE clause that matches more than one row.

I was only using this as an example of how transactions apply even with autocommit enabled.; 14 April 2008 at 15:33
Sheeri K. Cabral said...: Don't select @@LAST_INSERT_ID or last_insert_id() - use your API method to get it instead.

Or, use natural primary keys or other unique keys instead.; 14 April 2008 at 18:23
Artem Russakovskii said...: Mark, sure, I understand. As far as updates, what I meant when I said 'unique data' is updating a set of values to something that cannot be described by a single WHERE clause (a set of youtube view counts, for example).; 15 April 2008 at 04:58
Anonymous said...: Well in theory something like mysql_change_user() should be a reliable way to clean up your connections:
http://dev.mysql.com/doc/refman/5.0/en/mysql-change-user.html; 16 April 2008 at 18:53

Mark's stream of verbiage

Sunday, 13 April 2008

Improve reliability and maintainability by using stateless connections

5 comments:

Blog Archive

About Me

Links of interest