Saturday 22 March 2008

MySQL engines and space usage

A lot of people seem to spend a lot of effort comparing storage engine in MySQL - chiefly focusing on the difference between MyISAM and InnoDB.

People normally compare:

Feature sets:InnoDB: better durability, transactions, MVCC, foreign key constraints, row-level locking. MyISAM: fulltext, spatial indexes, table-level locking
Run-time performance: (see your favourite benchmark)

But few compare actual storage space usage. As this is very important to our application, I decided to run some tests.

I'm testing here with a realistic-sized table for our application (we partition data into daily partitions and spread them across many servers anyway, so this is just a small piece). We currently use MyISAM, and this is a typical table with approximately 4 million rows.

I can't dump the schema or content of this table here for confidentiality reasons, but it has:
  • 47 columns, many of which are VARCHARs
  • 17 indexes
  • Typical row size of 300 bytes (as reported by "SHOW TABLE STATUS")
Method

First I generated 4 Million rows of dummy data using a test tool (this is extremely useful as it avoids having to take real, potentially sensitive and very large data from production). Then I measured the space usage using SHOW TABLE STATUS, and successively ALTERed the table to different engines.

Note that ALTERing a table causes its indexes to be rebuilt - they will be a good deal smaller than if the rows had been inserted normally. Therefore, just an ALTER which doesn't change anything could make a big space saving.

Results

EngineData sizeIndex sizeNotes
MyISAM, PACK_KEYS=1 (initial)1152854640682459136Initial data load
MyISAM, PACK_KEYS=011528546402396920832ALTER
MyISAM, PACK_KEYS=DEFAULT1152854640677806080ALTER
MyISAM, PACK_KEYS=1 (alter)1152854640433162240ALTER
InnoDB19734200322442166272ALTER
sizes are in bytes.

In case that's not clear from the numbers above, here is a summary of the results
  • In MyISAM, the data are always the same size regardless of options
  • The initial (PACK_KEYS=1) index file was 650M
  • PACK_KEYS=0 makes the index 251% bigger
  • PACK_KEYS=DEFAULT makes the index marginally smaller (but it's been ALTERed which improves things)
  • Altering the table with PACK_KEYS=1 makes the index around 36% smaller than it originally was
  • InnoDB leaves the data 71% bigger and the index 257% bigger
Moving from MyISAM to InnoDB in this case would see an overall increase in space usage of 140% - that's making the disc requirement nearly 2.5 x more.

Conclusions
  • Before making a switch from MyISAM to InnoDB, plan your space usage carefully.
  • Measure your actual data, not mine. Yours may be different. We have a lot of indexes compared to some people.
Future work

I hope to be able to repeat this test with Falcon, Maria and PBXT some time soon.