Upgrading to 5.x seems like a good idea, as the additional validation work will be the same as would have to be done anyway - so we could effectively upgrade our MySQL version "for free", i.e. no additional QA work would be required than would be needed anyway.
Upgrading at any other time would involve a repetition of the validation exercise, which is not trivial as we must test at least:
- All features of the complex application which use these databases still work without errors
- The behaviour of the new version is consistent with the old one (e.g. results returned)
- Performance needs to be measured to ensure that performance regressions haven't happened (or are acceptable)
- Soak / stress testing needs to be done to ensure that the new version can perform under sustained high load.
Migration and rollback
My initial plan was to leave the MyISAM table files exactly as they were - after all, 5.0.x can read them fine (apparently) - so that if we need to rollback, we simply move back to the old version and everything works as before (including data created while we were on the new version).
However, this looks tricky. Load testing confirms that 5.0.x has frequent crashes when dealing with "old" tables (i.e. those which have not been upgraded). The MySQL manual insists that mysql_upgrade is run to upgrade all existing MyISAM tables.
This is fine if your database is small and noncritical, but ours are large and important. So what to do?
It is possible to upgrade tables using CHECK TABLE FOR UPGRADE and REPAIR TABLE. However, NO ROLLBACK METHOD exists. This is a problem.
I've estimated (using performance data from a production-spec test system) that in production, migration can be performed in 2-3 hours. This is an acceptable level of planned downtime. The only way of rolling BACK the table versions is to mysqldump the database and reload it. I estimate that this could take 16 hours, which is not acceptable.
It is not acceptable to apply a change without a working (i.e. tested) rollback procedure. Neither is it particularly acceptable to have 16 hours of downtime to perform one.
This makes the upgrade tricky. We're working on it.