Saturday, 22 March 2008

MySQL engines and space usage

A lot of people seem to spend a lot of effort comparing storage engine in MySQL - chiefly focusing on the difference between MyISAM and InnoDB.

People normally compare:

Feature sets:InnoDB: better durability, transactions, MVCC, foreign key constraints, row-level locking. MyISAM: fulltext, spatial indexes, table-level locking
Run-time performance: (see your favourite benchmark)

But few compare actual storage space usage. As this is very important to our application, I decided to run some tests.

I'm testing here with a realistic-sized table for our application (we partition data into daily partitions and spread them across many servers anyway, so this is just a small piece). We currently use MyISAM, and this is a typical table with approximately 4 million rows.

I can't dump the schema or content of this table here for confidentiality reasons, but it has:
  • 47 columns, many of which are VARCHARs
  • 17 indexes
  • Typical row size of 300 bytes (as reported by "SHOW TABLE STATUS")

First I generated 4 Million rows of dummy data using a test tool (this is extremely useful as it avoids having to take real, potentially sensitive and very large data from production). Then I measured the space usage using SHOW TABLE STATUS, and successively ALTERed the table to different engines.

Note that ALTERing a table causes its indexes to be rebuilt - they will be a good deal smaller than if the rows had been inserted normally. Therefore, just an ALTER which doesn't change anything could make a big space saving.


EngineData sizeIndex sizeNotes
MyISAM, PACK_KEYS=1 (initial)1152854640682459136Initial data load
MyISAM, PACK_KEYS=011528546402396920832ALTER
MyISAM, PACK_KEYS=1 (alter)1152854640433162240ALTER
sizes are in bytes.

In case that's not clear from the numbers above, here is a summary of the results
  • In MyISAM, the data are always the same size regardless of options
  • The initial (PACK_KEYS=1) index file was 650M
  • PACK_KEYS=0 makes the index 251% bigger
  • PACK_KEYS=DEFAULT makes the index marginally smaller (but it's been ALTERed which improves things)
  • Altering the table with PACK_KEYS=1 makes the index around 36% smaller than it originally was
  • InnoDB leaves the data 71% bigger and the index 257% bigger
Moving from MyISAM to InnoDB in this case would see an overall increase in space usage of 140% - that's making the disc requirement nearly 2.5 x more.

  • Before making a switch from MyISAM to InnoDB, plan your space usage carefully.
  • Measure your actual data, not mine. Yours may be different. We have a lot of indexes compared to some people.
Future work

I hope to be able to repeat this test with Falcon, Maria and PBXT some time soon.


Ronald Bradford said...

Mark, your Innodb increase of 2-3x is very common however you forget to mention two considerations here as well for size.

First, what is the structure of your primary key, is it an INT for example, as the physical size of the primary key will affect your index size.

Second, what is the insert order of data into the table base on the primary key. When you simply order the table in Innodb the data is packed, but if your normal insert pattern is not primary key order you will suffer a fill factor penalty. This is easily missed in tests without more realistic benchmarking.

As with this table, it's also important to ensure your indexes are completely optimized in structure and size considerations, another common problem.


Mark Robson said...

The primary key is an INT. It is always inserted in order as it's an AUTO_INCREMENT. This was not my design decision, but is now very difficult to change (as is almost anything else about this table; small changes can be made with big testing.)

ryan said...

Another important consideration is that the size of InnoDB tables jumps with regular usage patterns. Running OPTIMIZE every so often is important to reclaim that space but this locks the entire table for that time which is problematically time-consuming if the tables are large.

The point being -- InnoDB is very wasteful with space, so plan far ahead of your current (packed) capacity and watch your growth rates obsessively.

Unknown said...

This is an excellent post. Most people never consider the storage differences between MyISAM and InnoDB. I look forward to your storage comparisons of Maria and Falcon etc.