After compiling Percona Server with TokuDB, of course I wanted to compare InnoDB performance vs TokuDB.
I have a particular workload I’m interested in testing – it is an insert-intensive workload (which is TokuDB’s strong suit) with some roll-up aggregation, which should produce updates in-place (I will use INSERT .. ON DUPLICATE KEY UPDATE statements for that), so it will produce all good amount of reads.

A few words about the hardware: I am going to use new the Dell PowerEdge R420 with two Intel(R) Xeon(R) CPU E5-2450 0 @ 2.10GHz, 48GB of RAM and SATA SSD: Kingston HyperX 3K 240 GB.

Workload: I will use two different schemas. The first schema is from sysbench, and the table looks like:

CREATE TABLE sbtest$I (
id BIGINT UNSIGNED NOT NULL,
k INTEGER UNSIGNED DEFAULT '0' NOT NULL,
c CHAR(120) DEFAULT '' NOT NULL,
pad CHAR(60) DEFAULT '' NOT NULL,
PRIMARY KEY (id)
)
CREATE INDEX k on sbtest$I(k)

and tables sbtest$I_r10, sbtest$I_r100, sbtest$I_r1000, with roll-up sum for 10, 100, 1000 records in the main table.

and transactions for this workload are:

So this workload produces SEQUENTIAL inserts into Primary Key, that this is quite suitable for InnoDB, and in it we have random inserts into SECONDARY KEYS (k),
which is not so good for InnoDB.

So let’s see what results we have. The results are in TPS (more is better) and we start with empty tables.

Now, before looking at the graph, please do not jump to conclusions, as the graph is MISLEADING.

TokuDB-1

So we see that InnoDB performance steadily declines from 24000 tps to 18000 tps, but InnoDB can’t make 5h run. After 3h the disk is full, and InnoDB data size is about 210GB with 234.238.440 inserted records.
While TokuDB averages around 14000 tps mark with some periodical drops into 10000 tps area.
TokuDB datasize after 5h of run is about 50GB with 276.934.863 records.

So why do I say that the graph is misleading?
Obviously we can say that InnoDB is faster, but you should look into the steady decline of InnoDB throughput. Eventually it will drop to the level 14000 tps and below. I do not have enough space on this SSD to run this experiment that long. So there we see the strong side of TokuDB: it has more than 4x data compression on this dataset.
We can easily fill TokuDB tables with 1bln of rows on this SSD, and projected InnoDB performance on this size will be the same or worse, but will require 1TB in size.

Now to see this point of intersection, let’s review different workload (which actually is closer to what I need).

Tables looks like:

and transactions are:

That is, our PRIMARY KEY is not sequential anymore, which is bad for InnoDB, but this is what I need for my setup (I still can have synthetic auto_inc PK, but in that case I still will need SECONDARY KEY (hid,mid,id) ).

Also please note the transaction produces 4 INSERTs and workload is very write intensive.

So what are results in this case:

TokuDB-2

InnoDB gradually declines as data growth (which is expected) and by the end of 5 hours averages at 2700 tps.
With TokuDB we also see a drop, and by end of 5 hours the average throughput is 7800 tps.
Something to take into account: TokuDB results are not quite stable, that is why I also show 5-minute moving averages to TokuDB.

So TokuDB shows about 2.8x better throughput, and on data size:

  • InnoDB table: 58GB and 244.980.192 records
  • TokuDB table: 15GB and 232.927.460 records

So TokuDB looks better in this workload, however the sparse throughput is worrisome to me. Let’s zoom in to 10 min intervals and see throughput:
TokuDB-3
We can see periodical drops, which I believe are related to 60-sec checkpoint interval, as TokuDB does time-based checkpoints.
These drops are quite concerning, and it might be a problem for some users.

Now, I understand that my PRIMARY KEY (hid,mid,id) where id is sequential, and hid,mid is low selectivity is not good for fast inserts, but it is suitable for range selects by id. However it will interesting how both InnoDB and TokuDB performs if PK is (id,hid,mid). This also will affect select performance, so we will need to measure that also.

And, if you want to repeat this benchmark, the sysbench code is on Launchpad lp:~vadim-tk/sysbench/insert-roll-2,
command line to run:

and InnoDB options are:

TokuDB-related options are all defaults, as I understand from documentation TokuDB comes with good settings out-of-box, but I am ready to tune something if there are suggestions.

36 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Henrik Ingo

The point at 685 sec when InnoDB performance drops, is that when data doesn’t fit all in RAM anymore? I would in general expect to see benefits from TokuDB when workload is diskbound. You could artificially reduce InnoDB buffer pool (and corresponding Tokutek setting) to have more SSD vs RAM.

Alexander Zaitsev

The main performance win of TokuDB is indexes. If you add multiple multi-column indexes the difference with InnoDB will be even more notable. Also it allows multiple clustering indexes that give additional huge win on selects.

We used TokuDB in the past (versions 3.x and 4.x). It really shines on large tables and index scan like queries.

Michael Schmidt

Strikes me as odd that they actually take a previous InnoDB checkpoint benchmark from you as a reference for having outstanding checkpoint performance (eg http://www.tokutek.com/2012/04/tokudb-v6-0-frequent-checkpoints-with-no-performance-hit )

Also, which TokuDB row compressor was used? Default or lzma? If you used the default one it would be highly interesting to see the performance with lzma compression.

Gregory Schiano

In your my.cnf conf I don’t see barracuda file format and did you create your table using row_format = compressed with InnoDB which enables tablespace compression ?

Gregory Schiano

Then why comparing ompressed tables and uncompressed table on size ? That’s clearly not relevant !
Could you redo the same test with innodb compression ?

Thanks

Mark Callaghan

Vadim,

Thanks for a great performance report.

Not sure I understand why InnoDB used 210 GB for 234M rows when loaded in PK order. What am I doing wrong with my hand-wavy estimates?

234M * (200 + 20) for PK index +
234M * (12 + 7) * 1.5 for fragmented secondary index +
234M * (12 + 20) * 1.5 for fragmented sbtest_r10

I assume:
* 20 bytes for per-row metadata overhead in PK rows
* PK index pages are 15/16 full as they are loaded in order
* 7 bytes per-row metadata overhead for secondary index entries
* fragmented pages are 2/3 full

So my guess is:
* 50G for PK index
* 6G for secondary index on base table
* 11G for sbtest_r10
* much less for other rollup tables

Brian Cavanagh

The Storage part is interesting, but what about the access part. What happens when you change the ratios to include writes into the mix?

Andy

1) What concurrency did you use for the INSERT test? Don’t know if TokuDB handle concurrent workload as well as InnoDB

2) Can you also run benchmark on SELECT by PK or some range query by a secondary index? I heard that TokuDB is slower than InnoDB on SELECT but it’d be interesting to see by how much.

Rich Prohaska

Hello Vadim,

TokuDB supports an upsert optimization that is applicable to your second test. Typically, the “insert on dup key update” statements do a point query to read the row. If the row exists, then the row is updated. Otherwise, the row is inserted. TokuDB has an optimization in which the point query is avoided and replaced by the injection of an upsert message into the root of the fractal tree. This can result in dramatic performance increase since the cost of reads is amortized.

TokuDB currently detected update expressions like x=x+c, where c is a constant. We could support x=x+value(x). Let me know if you would like to run this test.

Andy

Rich,

Was this upsert optimization used in Vadim’s tests? If not how do you turn this optimization on?

svar

Vadim, can you confirm negative scalability with InnoDB compression and multi value inserts ?

Mark Callaghan

A big use-case for compression is to reduce the amount of fast storage you need to buy when fast storage is expensive.

Mark Callaghan

…so don’t expect it to not decrease performance but if a small loss of performance reduces your storage cost by more than half this can be a good compromise.

Mark Callaghan

Better compression and write-patterns that let you get longer lifetimes from lower-endurance (less expensive) flash devices — TokuDB can make a big difference

David

Vadim,

TokuDB does not use DirectIO with default setting. Hence, it maybe use more RAM than InnoDB in your benchmark.

svar

Thanks Vadim,

I did the test on 5.6 and load data infile concurrently results for two threads is 20 times slower then 1 file after the other. This could not be related to primary key worth distribution, memory or CPU as repeating the same test on uncompressed table scale as expected.

Mark did you cover this case in your patches ? Add my vote on compression as we are facing more and more big data challenges where innoDB and TokuDB will have to be compare with LevelDB at some point

Andy

Vadim,

> unfortunately upsert part did not work in this benchmark, so it was mostly INSERT benchmark

What did you mean by that? In your tests you ran many INSERT .. ON DUPLICATE KEY UPDATE queries. That’s upsert, no?

TokuDB seems to work pretty well (better than InnoDB in the 2nd test) for these upsert queries. So why did you say the upsert part did not work in this benchmark?

Rich Prohaska

TokuDB supports an optimization for upserts that avoids the point query in some cases. This optimization was not used in the benchmark. Since TokuDB does not currently support the VALUES update expression, the optimization was turned off. We are adding this to our code and will let you know when it is supported.

Mark Callaghan

svar,
I have done some tests with concurrent load but did not see performance drops. It is reasonable that N threads doing concurrent load will be a lot slower than 1 for some N. Eventually you change from having all index blocks for all tables in memory to not and then disk read stalls occur. Not sure that was your problem or something else. PMP stacks or perf schema output might help debut it.

svar

Mark thanks for the tip, this give me the motivation to put more efforts to explain what i saw. The same data load inside uncrompressed was good so PMP result need to follow, the only difference is the compression and range partitions. I’ll keep posting

Reda NOUSHI

Hi Vadim,

I can confirm the steady downfall of InnoDB throughput down to ~1500 insert/sec (and counting) while TokuDB remained steadily around 20000 insert/sec.

I inserted around 100Million rows using Tokutek’s iibench.py for the benchmark on a 2-way Supermicro server with 2 Xeon X5650 @2.7GHz, 96GB of RAM and a very fast LSI MegaRaid backed storage with SSD Cache.

Aurimas worked on that machine a month ago.

Mark Callaghan

Does Percona have improvements to keep the InnoDB insert buffer from getting full (as in do background reads for ibuf merges when it is getting full)? That is the key to sustaining good throughput for InnoDB on iibench.

Rob Turpin

From looking at the command line options for sysbench I’m assuming you modified the source. Can I get the modified sysbench source?

Rob Turpin

Vadim,

I hadn’t heard of launchpad, but I figured it out. I’m running some TokuDB vs InnoDB performance tests in a sharded environment. This should be a big help. Thanks.

Eric Robinson

I don’t know how you’re getting those kinds of TPS numbers with either database. I’m benchmarking InnoDB (XtraDB) running on Percona 5.6 and getting only 2100 TPS, and that even using sysbench in read-only mode (no inserts).

The server is a Dell PowerEdge R610, 12 x 3GHz Cores, 64GB RAM, 6 x Samsung SSD 840EVO 1TB drives (RAID 5).

Check out this non-awesomeness…

# sysbench –test=oltp –oltp-table-size=1000000 –mysql-db=test_innodb –db-driver=mysql –mysql-table-engine=innodb prepare
# sysbench –test=oltp –oltp-table-size=1000000 –max-time=60 –oltp-read-only=on –max-requests=0 –num-threads=8 –db-driver=mysql –mysql-db=test_innodb run

sysbench 0.4.12: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 8

Doing OLTP test.
Running mixed OLTP test
Doing read-only test
Using Special distribution (12 iterations, 1 pct of values are returned in 75 pct cases)
Using “BEGIN” for starting transactions
Using auto_inc on the id column
Threads started!
Time limit exceeded, exiting…
(last message repeated 7 times)
Done.

OLTP test statistics:
queries performed:
read: 1782676
write: 0
other: 254668
total: 2037344
transactions: 127334 (2122.15 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 1782676 (29710.12 per sec.)
other operations: 254668 (4244.30 per sec.)

Test execution summary:
total time: 60.0023s
total number of events: 127334
total time taken by event execution: 479.3998
per-request statistics:
min: 1.96ms
avg: 3.76ms
max: 851.53ms
approx. 95 percentile: 4.28ms

Threads fairness:
events (avg/stddev): 15916.7500/209.42
execution time (avg/stddev): 59.9250/0.00