Saturday 29 December 2007

On web application configuration

There are two common approaches for configuring web applications:

  • Configuration file(s) in the web application's directory
  • Storing configuration data in the database

What sort of data am I talking about:

  • Email setup (where to send response form, error emails etc, whether to send mail to customers etc)
  • Setup of things which need to vary between development and production - payment service provider configuration, setup of other third party integration things
  • Error handler configuration (whether to display errors, where to log them, email them etc)
  • Enabling / disabling features (for example, debug things)
  • Visual stuff or branding (labels etc)

Clearly the database connection string needs to be stored somewhere too, but that can't be in the database for obvious reasons.

Database configuration

pros: does not need to be configured on each web server of a farm; not accidentally overwritten by new deployments
cons: bad for hierarchical or structured data; performance overhead; difficult to maintain configuration data stored in a database in SCM. Modifications require apply/ rollback scripts

Configuration files

pros: Easy to manage in SCM (like all your other files, even if not in the same place); less runtime performance overhead; better for hierarchical data; database connection string can be held with other items.
cons: Needs to be synchronised across a web farm; care should be taken that the right versions are always deployed everywhere.

Ideas

For a larger application with a web farm, using a database for some configuration is clearly beneficial. If you have many servers and/or servers of other types (non-web servers), they can store all their configuration in a central database. If performance is a problem, copies can be cached locally.

For a small simple application, configuration files are better- they're more manageable generally. I prefer to put most configuration in a file if possible.

Monday 3 December 2007

mysql dumps are not text files

Today's pearl of wisdom:

MySQL dump files, as produced by mysqldump, are not text files.

A common mistake is to think that mysqldump files are text files. While they superficially look like text files, they in fact are not. This leads to confusion and problems if people try to edit them in a text editor application, which will in many cases cause corruption.

MySQL dump files contain the SQL commands required to recreate the contents of a database (or subset thereof). However, these are SQL commands in arbitrary (binary) encoding and can consist of a mixture of different encodings (e.g. if you have BLOBs). This mixture is not safe to edit with a text editor, which expects a text file to contain character data in exactly one encoding (and will typically guess, ask the user or just use its default to determine which).

Suppose you use utf8 data in your database - then your mysqldump file will probably contains all of its strings in utf8. This is fine, until you get to a BLOB. This is a binary lump of data and will be full of sequences of bytes which are NOT LEGAL in utf8, therefore any editor attempting to make sense of them in utf8 won't succeed. It will therefore be forced to change them into something else (i.e. different from the original BLOB data) in order to load them into its buffers.

Saving this modified (or even unmodified) file back out will result in loss of data.

Summary

  • MySQL dump files aren't text files even though they look like them

  • Editing one in a text editor will probably lose data

  • They are often also too big for a text editor

  • And have lines too long

Monday 26 November 2007

Some MySQL hacking projects

I have two ideas for hacking MySQL in a useful manner, in response to issues I've had using it:

InnoDB tablespace usage monitor



A really simple tool that would report the amount of innodb tablespace used by specific tables and, ideally, indexes too.

This is required as I'm fairly sure that there isn't a tool to do this at the moment.

MyISAM mmap() its index files



I'm having trouble tuning MyISAM's key_buffer for production use. We want to use delay_key_write to reduce the IO of a lot of inserts, but at the same time, I'd prefer it not to get too out of hand, as large flushes create a lot of I/O in one go.

Anyway perhaps that was a poor explanation of my motivation, however, the idea would be:


  • mmap() the whole of each index file when the table is opened.

  • Leave the memory mapping in place as long as the table is open. resize the file and the memory mapping if necessary, e.g. when the index file gets bigger.

  • Have all reads/writes go via the memory mapping.

  • msync() as necessary (e.g. following a write)



I'm confident that this will have a performance benefit, but mostly, it will remove the onus on the DBA to tune key_buffer correctly, and allow the OS to use its memory how it wants to.

The downside? It won't work very well on 32-bit systems as they'll rapidly run out of address space. But do we care any more?

Sunday 14 October 2007

mysql - ENUMs and sql_mode

I've been doing a few experiments with ENUMs and SQL_MODE.

Not a lot of people realise that in MySQL, an ENUM can actually contain one other value as well as the specified ones - no, not NULL, but ANOTHER value, specifically, the empty string ''.

This can lead to trouble. The empty string is entered by MySQL when it tries to insert an invalid value into the column. It does also give a warning, but nobody takes any notice of those right?

Demo schema:


CREATE TABLE enumtest (
id int not null auto_increment,
name varchar(100) NOT NULL,
status ENUM('ok','broken','decommissioned','narnia'),
PRIMARY KEY(id),
KEY(name),
KEY(status)
);


Now let's try some inserts...


INSERT INTO enumtest (name,status) VALUES
('Mark','ok'),
('Fred','broken'),
('Bob','decommissioned');


So far so good. Now how about:


mysql> insert into enumtest (name,status) VALUES ('bust','wibble');
Query OK, 1 row affected, 1 warning (0.00 sec)


Ok, so it inserted something, and we got a warning. What's there now?


SELECT * FROM enumtest;
+----+------+----------------+
| id | name | status |
+----+------+----------------+
| 1 | Mark | ok |
| 2 | Fred | broken |
| 3 | Bob | decommissioned |
| 4 | bust | |
+----+------+----------------+
4 rows in set (0.00 sec)


So what's happened is, we've attempted to insert an invalid value (wibble) and it's put an empty string in instead (empty string is not a null, as we defined the column NOT NULL).

So what's the fix? SET sql_mode to traditional:


mysql> SET SQL_MODE='TRADITIONAL';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO enumtest (name,status) VALUES ('breakme','wibble');
ERROR 1265 (01000): Data truncated for column 'status' at row 1


Much better. We can now actually get ENUMs that only contain values they're supposed to. Good.

Conclusion: use sql_mode TRADITIONAL whenever possible.

Saturday 6 October 2007

MySQL: delay_key_write is good

If you're using MySQL with the MyISAM engine for a high-writes application, delay_key_write is usually very good.

Let me explain further: delay_key_write is a table option which causes the database NOT to flush the MyISAM key file after every write. This is a really good thing, as if you're doing another write very soon anyway, this is likely to just waste I/O time.

This doesn't sound like a good idea right, because it means that if the power fails (or mysql crashes, or something), then you'll be left with a broken index file? No, it's still a very good idea:
  • delay_key_write does NOT appear to affect the MyISAM data file - that will still be flushed according to the normal policy.
  • If you had delay_key_write off, then a power failure or crash during the index write would cause the same level of corruption.
  • Broken MyISAM index files need to be rebuilt, regardless of how little or how much "brokenness" they actually have.
How do you turn delay_key_write on?

It's a table option, so you have to do it at create time. It's possible to do ALTER TABLE, but that behaves how it normally does under MySQL - i.e. it rebuilds the entire table, just to change one flag. This is of course necessary if you're changing the schema, but delay_key_write shouldn't - sadly, it does.

This means that on a large existing table, the ALTER TABLE needs to be done with care (i.e. out-of-hours or during scheduled down time, if at all).

Another option is to set the variable delay_key_write to "ALL" - however, this is not necessarily a good idea as it applies to all tables (in the entire server) whether you want it to or not.

On infrequently written tables, delay_key_write could increase the chance of index corruption.

Our performance testing cycle for delay_key_write is now over - quite frankly we're amazed by how much performance it gives us for so little effort - it will definitely be going into production.

Our servers are being hurt with heavy write-load which is keeping (at least one thread of) mysql almost permanently in I/O-wait - delay_key_write makes almost all of this go away making inserts much quicker and leaving the server more time for other things. It's likely to become essential once our server workload grows even higher.

Saturday 30 June 2007

Critical Mass

Last night I went to Critical Mass - it was fairly quiet but regulars were there. Didn't get rained on, which was good.

No bike polo however (Which is moderately dangerous and I'm rubbish at) - but we did see an interesting, erm, thing for Cowley Road Carnival. Some chaps in East Oxford have built a spider-bike - giant spider on top of a frame with three bikes underneath. It animates as it goes along. The middle rider operates the spider and steering, the two back riders provide power.

Upcoming events: Apparently there is a Perl programmers meeting in Reading on 10th July somewhere - I'll post more details when I have them.

Sunday 24 June 2007

Introduction

Well, here we are. My first blog post on Blogger. Shows that I'm really with the times, starting several years after blogging became passé.

Some topics which I may post on:

  • Computing generally
  • IT Security
  • Linux and/or other operating systems
  • Music
  • Beer
  • The weather (unlikely)
I'll put some links in soon.