Sunday, 13 July 2008

Versioning your schema

How do you version your database schemas?

There are quite a lot of different options. On some very small / trivial applications developed by just one person, it might be acceptable to use ad-hoc queries (maybe with a GUI) to manage the schema - but this runs the risk of development and production schemas becoming out of sync (or rather, out of sync with their respective versions of code).

Requirements are typically
  • Schema changes are held with the rest of the software in a SCM system
  • Schema changes can be made in one development environment and will be correctly propogated with code to other development and production environments as required
  • Schema changes can be rolled back (if for example, a deployment contains a serious bug)

There doesn't seem to be an easy answer for this one.

Our teams do something like:
  • Have a base create.sql which generates the original schema
  • In each revision where a schema change is required, write an apply.sql and remove.sql script to apply / roll back the change
  • These scripts need to be hand written as there are often special cases - sometimes several ALTER statements are required to make a single change, sometimes in combination with UPDATEs etc. Rollback scripts are even more tricky and sometimes need to restore from a backup table.
  • Have a system of scripts which works out which apply.sql / remove.sql is/are required (when deploying a new version of the code) and runs them, in order to update the schema
The problem with this system is that it relies on developers correctly writing apply.sql and remove.sql - it is fairly error prone. Although our validation process is fairly rigorous, it still lets errors slip through occasionally.

The problem is compounded by having multiple teams working on different code branches, not knowing necessarily which order their releases will go out in (one team would ultimately be responsible for merging the other's changes into their branch prior to release).

Ideas on a postcard please :)

Mark

7 comments:

ReneL said...

I would recommend using svn 1.5 or an advanced SCM like Bazaar.

Then you could introduce a db-upgrade script (for example in Python), a db-upgrade.changelog and db-upgrade.last.

The developers log their new schema changes with a unique id and a short description.

db-upgrade.py then applies all schema changes since db-upgrade.last and overwrites the last id.

Example: db-upgrade.changelog
1.4.7 last schema change (for now)
1.4.6 next schema change
1.4.5 first schema change

Example: db-upgrade.last
1.4.5

db-upgrade.py reads db-upgrade.last (1.4.5), recognizes that db-upgrade.changelog contains new revisions (1.4.6 and 1.4.7), executes apply-1.4.6.sql and apply-1.4.7.sql and overwrites db-upgrade.last with 1.4.7

Mark Robson said...

That is pretty much what we do. I have described above the problems with this system - do you have any suggestions on how to mitigate them?

Your SCM recommendation is noted but it's very unlikely that we will change (too much work; lots of existing ad-hoc tools and processes would need changing)

Mark

andrew.johnstone said...

Another alternative to aid in automating the above processes is to use mysql proxy, and/or google-mysql-tools ...

See section labeled "Change Management":
http://jan.kneschke.de/projects/mysql/mysql-proxy/

CREATE|ALTER|DROP TABLE
CREATE|DROP INDEX
CREATE|ALTER|DROP DATABASE

http://code.google.com/p/google-mysql-tools/wiki/MoreLogging

audit_log - log logins, queries against specified tables, and startup
log_tables - log queries that use these table to the audit log (comma seperated)
log-update - log DDL and DML by users with the SUPER privilege

You can then pipe the changes into a file and into your SCM....

Robert Hodges said...

My team ran a very large web application for many years for a software-as-a-service operation. At one point we would have to update several hundred schemas in a single maintenance window. We used two main tricks:

1.) The product installation consisted of a loading a dump then applying numbered apply.sql scripts of the sort you and some of the commenters describe. The main thing is that we built this into the basic install so that every developer and QA engineer effectively ran a product upgrade each time they set up tests or started development. As a result, we very rarely encountered upgrade problems because the apply scripts were well tested long before they hit production.

2.) Use a standby database for back-out. Back-out scripts have to work in every failure case, which is very hard to test. Some operations like dropping tables cannot be backed out. It's therefore easier in most cases to restore from a second database. Oracle and PostgreSQL both have good log-shipping mechanisms to make this easier.

Some other things go without saying. We of course had everything in SCM. Also, we tweaked the SQL application process so that it would remember its position in the apply script if there was a failure and let you restart. That was really helpful over time as well.

Sheeri K. Cabral said...

The biggest problem I found was that it's difficult to mark when the apply/remove scripts are executed in the SCM. Meaning that you have to work very hard to realize "is this when this apply script was done?"

Also, there's the issue of when to checkin all this stuff -- do you do a checkin of the apply/remove scripts, and then a 2nd checkin immediately after with the changed schema?

SCM was made for code, which is the equivalent of just putting the schema into change management. The scripts are useful, but you don't put an awk, sed or even ed script to change code from one version to another. Doing that for the schema is awkward because the system wasn't designed to check in the "how" you get from point A to point B; it was designed to checkin point A and point B.

DanHuby said...

I came up with a solution to this for the open source ResourceSpace DAM system which I administer.

The database schema is stored as a series of flat text files which are under version control along with the site code.

If a database error occurs (e.g. because a table or column doesn't exist) it is trapped and code is called that compares the current database against the stored text files, which may have been updated, and automatically creates any missing tables or columns. It then retries the SQL. If it fails a second time, fair enough, it was a genuine SQL error.

New tables can have default data populated automatically and indices can be created.

It might not fit all situations but it certainly works and means that users never need to patch their database during an upgrade.

Additionally it also takes care of the database installation - you start with an empty database and the tables/columns are automatically created the first time you access the system.

The downside is having to change the flat files rather than using your favourite tools to modify the database, but it's a fairly minor issue.

See:
http://svn.montala.net/svn/resourcespace/dbstruct/

and the CheckDBStruct function in:
http://svn.montala.net/svn/resourcespace/include/db.php

Simon J Mudd said...

Were I used to work we did similar things to those posted here.

We stored the "create schema from scratch" script for each version, and then scripts to upgrade from one version to the next. Our environment was Sybase with replication. One thing that was important was that the database knew its application version number. This was used by the upgrade scripts to determine which other scripts to apply, and in what order.

When you start to add things like foreign key constraints, triggers and stored procedures this makes life more difficult, and it's vitally important that you get the order of changes correct. Another thing worth pointing out is knowing how long things will take. Even if the upgrade script is correct it may be the developer who created it doesn't take into account the fact that an ALTER TABLE may take several hours to run. In a smaller development environment it may not take long. If you are not expecting this it can be a nasty surprise when you run the script in production on a database which is much larger.

Upgrade notes for each step are really important.

Eventually it came down to TEST, TEST, TEST and having a "free" development environment, but also a development environment as close to production as possible so that you can be sure that the scripts if they work in this environment will also work in production.