Sunday 20 April 2008

MySQL engines space usage comparison - continued


Since Oracle announced the InnoDB plugin which provides compressed row formats, I thought I'd do this comparison again.

Here are some more data that I've cobbled together.

I've created a 2M row table with a schema that we use for audit data. It has 47 columns and 17 indexes.

I copied the table into new empty tables using INSERT ... SELECT with each of these engines:
  • InnoDB row_format=compressed (from the InnoDB plugin)
  • InnoDB row_format=Compact
  • InnoDB row_format=Redundant
  • MyISAM pack_keys=1
  • MyISAM packed with myisampack (NOTE: these tables are readonly)
  • PBXT
  • Falcon
Here are the data:









EngineSpace (MB)
InnoDB compressed636
InnoDB compact1946
InnoDB redundant2253
MyISAM pack_keys=1738
MyISAM packed476
PBXT3379
Falcon999

I am very impressed with how good Falcon is. I am very impressed with InnoDB's compressed row format's space usage.

I don't think it's very reasonable to compare the loading times for the different engines as they all work in different ways, but some were a LOT faster than others. Take this with a pinch of salt:

Fast: MyISAM, Falcon, InnoDB except compressed
Slow: InnoDB compressed, PBXT

InnoDB compressed and PBXT both took well over an hour to complete what the others did in a few minutes. I can't explain this - maybe it's a tuning issue.

In each case, the copy was done in an INSERT ... SELECT which uses a single transaction. I tried to tune the buffers as best as I could (the server has enough ram) but I'm no expert in PBXT and Falcon.

11 comments:

Unknown said...

For total size taken, you should have tried out Archive :)

Mark Robson said...

Engine=Archive may save a lot of space, but it can't replicate what I'm trying to do here as it doesn't support indexes.

Paul McCullagh said...

Hi Mark,

I think I have an idea what the problem might be. PBXT divides rows into a fixed length and a variable length (overflow) part. The size of the fixed length part is determined at table creation by a simple heuristic.

If the fixed length component is much greater than the average row size, then the table will become to large. Just like MyISAM, I plan to support a AVG_ROW_LENGTH on CREATE TABLE. Then it will be possible to set this parameter on table creation.

However, I would still like to check if this is indeed the problem, and also see if the performance problem is related to this. I will send you an e-mail. Maybe you will be able to give me some more details on the test, table structure, data distribution, etc.

Best regards,

Paul

Ken Jacobs said...

Thanks, Mark. Very interesting results. Did you happen to use the other important new feature in the InnoDB Plugin: Fast Index Create?

If you create several secondary indexes with ONE ALTER TABLE command AFTER you load the data, InnoDB can scan the data once, and sort it once for each index.

Would love to see more details on your testing, including load times, times to create indexes, and the time it takes to run the applications' most typical SQL operations against the data.

Thanks!

Ken Jacobs said...

Thanks, Mark. Very interesting results. Did you happen to use the other important new feature in the InnoDB Plugin: Fast Index Create?

If you create several secondary indexes with ONE ALTER TABLE command AFTER you load the data, InnoDB can scan the data once, and sort it once for each index.

Would love to see more details on your testing, including load times, times to create indexes, and the time it takes to run the applications' most typical SQL operations against the data.

Thanks!

Ken Jacobs said...

Sorry Mark for the double-post. Browser usage/pilot error.

One further tip: did you consider trying different compressed page sizes with the InnoDB Plugin? It is possible that your data (which seems to compress quite well), might work with an even smaller page size. Or it might not! ;-)

The default page size with InnoDB row_format=compressed is 8K. It might be nice to see how it does with 4K.

By the way, could you possibly post the schema? It would be very interesting to see the column and index definitions.

Thanks

Mark Robson said...

For InnoDB compressed tables I used page_size=4k, as that was what was suggested in the InnoDB plugin docs.

I did consider using the online index add feature, but these things take a long time, I can only do so many runs :)

Heikki Tuuri said...

Mark,

does the table have a long primary key?

Heikki Tuuri

Mark Robson said...

No, the table has a simple BIGINT primary key.

Ken Jacobs said...

Thanks for the extra info, Mark. I suspect your use of the fast (not exactly "online") index create would substantially improve the time to load and create indexes, when taken together.

It is interesting that you achieved these results with 4K pages. We suggest that 8K is a more typical page size, but it is good you saw such good results. Depending on your data and the size of rows, you might or might not get similar results with an 8K size. And loading time might be different as well.

Would it be possible to send us an extract of your data and the schema? We could do some further testing and post the results on our website, if you like.

Thanks

Ken

Mark Robson said...

At the moment I can't give you a dump of the data or schema as it is part of our internal application.

I did consider making a "cleaned up" version which removed stuff that might be confidential, I think that would take significant work.

I'm not using production data here, but it's generated by a test harness which simulates production data using some production configuration data which reveals a few things, hence I can't publish it.

If you could contact me outside this blog (I don't have any contact email for you) then I might be able to provide some more information.

Thanks

Mark