Monday, 3 December 2007

mysql dumps are not text files

Today's pearl of wisdom:

MySQL dump files, as produced by mysqldump, are not text files.

A common mistake is to think that mysqldump files are text files. While they superficially look like text files, they in fact are not. This leads to confusion and problems if people try to edit them in a text editor application, which will in many cases cause corruption.

MySQL dump files contain the SQL commands required to recreate the contents of a database (or subset thereof). However, these are SQL commands in arbitrary (binary) encoding and can consist of a mixture of different encodings (e.g. if you have BLOBs). This mixture is not safe to edit with a text editor, which expects a text file to contain character data in exactly one encoding (and will typically guess, ask the user or just use its default to determine which).

Suppose you use utf8 data in your database - then your mysqldump file will probably contains all of its strings in utf8. This is fine, until you get to a BLOB. This is a binary lump of data and will be full of sequences of bytes which are NOT LEGAL in utf8, therefore any editor attempting to make sense of them in utf8 won't succeed. It will therefore be forced to change them into something else (i.e. different from the original BLOB data) in order to load them into its buffers.

Saving this modified (or even unmodified) file back out will result in loss of data.

Summary

  • MySQL dump files aren't text files even though they look like them

  • Editing one in a text editor will probably lose data

  • They are often also too big for a text editor

  • And have lines too long

2 comments:

gmax said...

Just for completeness, be aware that mysqldump has an option, --hex-blob, which will make your text almost editor safe (unless your editor truncates long lines)

Xaprb said...

This is a good point, and I made a note of it in High Performance MySQL. Hopefully this will help people avoid the pitfalls you mention.