Today I noticed one of server used for web request profiling stats logging is taking about 2GB per day for logs, which are written in MyISAM table without indexes. So I thought it is great to try how much archive storage engine could help me in this case.

Original Table:

Lets see what table compression gives us:

Surprise. It would not tell you how much space table takes. Quite unfortunate especially as it should not be hard to do – There are no indexes and information about size on disk and compression buffer stage in memory should be readily available. So we have to go and check size of on disk file:

-rw-rw—- 1 mysql mysql 19 Nov 12 11:31 requests_061111.ARM
-rw-rw—- 1 mysql mysql 984628803 Nov 12 11:29 requests_061111.ARZ
-rw-rw—- 1 mysql mysql 9162 Nov 12 11:00 requests_061111.frm

Can’t say I’m impressed we have only about 2 times compression rate for this table which is heavily redundant (urls, referers, user agents etc)

To check if this is fair compression rate I also run gzip requests_061111.MYD and I got file which as about 520MB in size – almost half the size of archive table. I guess this is due to incremental compression which Archive engine uses, still it should have done much better. Might be configuration option should be added to allow compression buffer to be larger and so compression more efficient. I also run OPTIMIZE table and size stayed the same – not a big surprise as I got this table by converting MyISAM table rather than incremental insertions.

I also compared it to compressed MyISAM – this would take 1250MB so Archive does compress a bit better than myisampack.

I also checked how much table would take if compressed with bzip2 – it took a while to compress but result was 310MB 40% better than compressed by gzip. I wish Archive storage engine would specify archiving method and compression level. Some people may like fast compression while others would prefer to have smallest size possible.

Now regarding full table scan speed – in my case running simple full table scan query on the table changed from 5 min to about 4 minutes, less than table size reduction which is surprising. May be this is because server is loaded and having concurrent IO which randomizes otherwise sequential IO. It would be interesting to know which buffer size Archive storage engine using for reads – I could not find it as difference could be due to larger read_buffer_size used by MyISAM.

Summary: Archive storage engine does compresses your data and should be good for log storage or storing something which needs no updates and no indexes, however compression ratio is not perfect, configuration and stats could be improved.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Brian Aker

Hi Peter!

What version of the server are you using? More information should be available in 5.0 for Archive then what is produced above.

Cheers,
-Brian

Andrew Aksyonoff

I suppose it’s compressing the records independently; hence not-so-good compression ratio.

AMCT

Nice stuff.

Indra

6 years later, any better?

I guess the compression ratio isnt good because it needs to be selected as quick as possible (fast compression).
How bout to compress the *.myd as small as possible but then voila, need to do select?