One of the serious limitations in the fast index creation feature introduced in the InnoDB plugin is that it only works when indexes are explicitly created using ALTER TABLE
or CREATE INDEX
. Peter has already blogged about it before, here I’ll just briefly reiterate other cases that might benefit from that feature:
- when
ALTER TABLE
does require copying the data into a temporary table, secondary indexes are updated by inserting individual rows rather than sorting; OPTIMIZE TABLE
could be faster if secondary indexes were temporarily dropped and then recreated using fast index creation;- dumps produced by mysqldump first create tables with all secondary indexes and then load the data, which is also inefficient.
Percona Server as of versions 5.1.56 and 5.5.11 allows utilizing fast index creation for all of the above cases, which can potentially speed them up greatly. This feature is controlled by the expand_fast_index_creation
system variable which is OFF by default.
Let’s look at each of the above cases in more detail.
ALTER TABLE
By temporarily dropping secondary indexes from the new table before copying the data, and then recreating them later, ALTER TABLE
can take advantage of the fast index creation feature even when it has to copy the entire table.
To illustrate this, I have performed a number of simple benchmarks. Let’s start with a table containing 4 million rows and one secondary key:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | mysql> CREATE TABLE t(id INT AUTO_INCREMENT PRIMARY KEY, c FLOAT) ENGINE=InnoDB; Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO t(c) VALUES (RAND()); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO t(c) SELECT RAND() FROM t; Query OK, 1 row affected (0.00 sec) Records: 1 Duplicates: 0 Warnings: 0 . . . mysql> INSERT INTO t(c) SELECT RAND() FROM t; Query OK, 2097152 rows affected (10.11 sec) Records: 2097152 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE t ADD KEY (c); Query OK, 0 rows affected (18.56 sec) Records: 0 Duplicates: 0 Warnings: 0 |
Let’s trigger a table rebuild by adding a new column and see what execution time is like when the default method is used:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | mysql> SET profiling=1; Query OK, 0 rows affected (0.00 sec) mysql> ALTER TABLE t ADD COLUMN v VARCHAR(1); Query OK, 4194304 rows affected (1 min 1.97 sec) Records: 4194304 Duplicates: 0 Warnings: 0 mysql> SHOW PROFILE; +------------------------------+-----------+ | Status | Duration | +------------------------------+-----------+ | starting | 0.000054 | | checking permissions | 0.000004 | | checking permissions | 0.000004 | | init | 0.000008 | | Opening tables | 0.000118 | | System lock | 0.000007 | | setup | 0.000027 | | creating table | 0.002255 | | After create | 0.000050 | | copy to tmp table | 61.816063 | | rename result table | 0.161528 | | end | 0.000007 | | Waiting for query cache lock | 0.000002 | | end | 0.000007 | | query end | 0.000003 | | closing tables | 0.000008 | | freeing items | 0.000021 | | cleaning up | 0.000003 | +------------------------------+-----------+ 18 rows in set (0.00 sec) |
Now let’s see how performance is affected when turning expand_fast_index_creation
on. Here and in later examples I’m extending the VARCHAR
column to trigger table rebuilds without affecting the table size.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | mysql> SET expand_fast_index_creation=ON; Query OK, 0 rows affected (0.00 sec) mysql> ALTER TABLE t MODIFY v VARCHAR(2); Query OK, 4194304 rows affected (36.07 sec) Records: 4194304 Duplicates: 0 Warnings: 0 mysql> SHOW PROFILE; +------------------------------+-----------+ | Status | Duration | +------------------------------+-----------+ | starting | 0.000054 | | checking permissions | 0.000004 | | checking permissions | 0.000005 | | init | 0.000010 | | Opening tables | 0.000027 | | System lock | 0.000008 | | setup | 0.000040 | | creating table | 0.002176 | | After create | 0.000058 | | copy to tmp table | 18.083490 | | restoring secondary keys | 17.824109 | | rename result table | 0.162041 | | end | 0.000008 | | Waiting for query cache lock | 0.000002 | | end | 0.000007 | | query end | 0.000003 | | closing tables | 0.000008 | | freeing items | 0.000019 | | cleaning up | 0.000003 | +------------------------------+-----------+ 19 rows in set (0.00 sec) |
As seen from the SHOW PROFILE
output, copying the data to a temporary table without updating indexes took 18 seconds, and about the same time was spent on rebuilding the index using fast index creation. So we have 36 seconds in total which is about 1.7 times faster than updating indexes by insertion.
Let’s see if having more secondary indexes in the table makes any difference:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | mysql> SET expand_fast_index_creation=OFF; Query OK, 0 rows affected (0.00 sec) mysql> ALTER TABLE t ADD KEY (c), ADD KEY(c); Query OK, 0 rows affected (36.42 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE t MODIFY v VARCHAR(3); Query OK, 4194304 rows affected (3 min 4.87 sec) Records: 4194304 Duplicates: 0 Warnings: 0 mysql> SHOW PROFILE; +------------------------------+------------+ | Status | Duration | +------------------------------+------------+ . . . | copy to tmp table | 184.694432 | . . . +------------------------------+------------+ 18 rows in set (0.00 sec) mysql> SET expand_fast_index_creation=ON; Query OK, 0 rows affected (0.00 sec) mysql> ALTER TABLE t MODIFY v VARCHAR(4); Query OK, 4194304 rows affected (1 min 11.12 sec) Records: 4194304 Duplicates: 0 Warnings: 0 mysql> SHOW PROFILE; +------------------------------+-----------+ | Status | Duration | +------------------------------+-----------+ . . . | copy to tmp table | 18.396514 | | restoring secondary keys | 52.567644 | . . . +------------------------------+-----------+ 19 rows in set (0.00 sec) |
So with 3 secondary indexes expand_fast_index_creation
gave us a 2.6x speedup.
Also note that unlike the default method, where the execution time is proportional to the number of indexes, with fast index creation the time required to copy the data to a temporary table is constant. The reason is that when using merge sort, InnoDB has to scan the clustered index only once, even though the actual sorting is done separately for each index.
The above has a couple of important implications:
- when the data does not fit in the buffer pool, fast index creation provides even better performance as compared to the default method, because it does not have to do random disk seeks to fetch secondary index pages to the buffer pool. A benchmark is worth a thousand words, so let’s repeat the last test with
innodb_buffer_pool_size
set to approximately 1/10th of the dataset:12345678910111213141516171819202122232425262728mysql> SET expand_fast_index_creation=OFF;Query OK, 0 rows affected (0.00 sec)mysql> ALTER TABLE t MODIFY v VARCHAR(7);Query OK, 4194304 rows affected (9 min 15.08 sec)Records: 4194304 Duplicates: 0 Warnings: 0mysql> SET expand_fast_index_creation=ON;Query OK, 0 rows affected (0.00 sec)mysql> ALTER TABLE t MODIFY v VARCHAR(8);Query OK, 4194304 rows affected (1 min 13.69 sec)Records: 4194304 Duplicates: 0 Warnings: 0mysql> ALTER TABLE t MODIFY v VARCHAR(8);Query OK, 4194304 rows affected (1 min 13.69 sec)Records: 4194304 Duplicates: 0 Warnings: 0mysql> SHOW PROFILE;+------------------------------+-----------+| Status | Duration |+------------------------------+-----------+. . .| copy to tmp table | 19.805849 || restoring secondary keys | 53.885502 |. . .+------------------------------+-----------+19 rows in set (0.00 sec)
So, as expected, a small buffer pool had a huge impact onALTER TABLE
with the optimization disabled, and absolutely no effect on the optimized case, which resulted in an almost 8x speedup. - having tmpdir on a fast storage is essential for
expand_fast_index_creation
, because temporary files for merge-sorting are created in tmpdir. The constant “copying to tmp table” part will not be affected by a slow tmpdir, but rebuilding the indexes will obviously take longer.
Another important thing that is worth mentioning is fragmentation. Fast index creation results in much less fragmented indexes because records are inserted in the correct order into sequentially allocated pages after merge-sorting. So besides optimizing DDL directly, expand_fast_index_creation
may also optimize index access for subsequent DML statements. In my test setup I got about 178 MB index size after fast index creation as reported by SHOW TABLE STATUS
versus 265 MB index size with the optimization disabled.
OPTIMIZE TABLE
OPTIMIZE TABLE
is mapped to ALTER TABLE ... ENGINE=InnoDB
for InnoDB tables and thus, is just a special case of the previous one:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | mysql> SET expand_fast_index_creation=OFF; Query OK, 0 rows affected (0.00 sec) mysql> OPTIMIZE TABLE t; +--------+----------+----------+-------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +--------+----------+----------+-------------------------------------------------------------------+ | test.t | optimize | note | Table does not support optimize, doing recreate + analyze instead | | test.t | optimize | status | OK | +--------+----------+----------+-------------------------------------------------------------------+ 2 rows in set (2 min 57.65 sec) mysql> SHOW TABLE STATUS LIKE 't'G *************************** 1. row *************************** Name: t Engine: InnoDB Version: 10 Row_format: Compact Rows: 4195067 Avg_row_length: 29 Data_length: 125452288 Max_data_length: 0 Index_length: 278839296 Data_free: 1838153728 Auto_increment: 4587468 Create_time: 2011-11-06 10:01:18 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.01 sec) mysql> SET expand_fast_index_creation=ON; Query OK, 0 rows affected (0.00 sec) mysql> OPTIMIZE TABLE t; +--------+----------+----------+-------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +--------+----------+----------+-------------------------------------------------------------------+ | test.t | optimize | note | Table does not support optimize, doing recreate + analyze instead | | test.t | optimize | status | OK | +--------+----------+----------+-------------------------------------------------------------------+ 2 rows in set (1 min 12.19 sec) mysql> SHOW TABLE STATUS LIKE 't'G *************************** 1. row *************************** Name: t Engine: InnoDB Version: 10 Row_format: Compact Rows: 4195067 Avg_row_length: 29 Data_length: 125452288 Max_data_length: 0 Index_length: 187465728 Data_free: 1930428416 Auto_increment: 4587468 Create_time: 2011-11-06 10:04:10 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.01 sec) |
mysqldump
Quoting the fast index creation chapter in the MySQL manual:
“… you can generally speed the overall process of creating and loading
an indexed table by creating the table with only the clustered index,
and adding the secondary indexes after the data is loaded.”
mysqldump in Percona Server supports the new option --innodb-optimize-keys
which does just that, i.e. it tries to optimize dumps of InnoDB tables by first creating the table with only the clustered index and adding the secondary indexes after the data dump when possible (see Caveats below).
Let’s compare the restore time for a regular dump with a dump created with --innodb-optimize-keys
(the test
database contained only the table I used in my previous examples):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | $ mysqldump -uroot test > dump_unoptimized.sql $ mysqldump -uroot test --innodb-optimize-keys > dump_optimized.sql $ time mysql -uroot test < dump_unoptimized.sql real 2m52.785s user 0m3.179s sys 0m0.069s $ time mysql -uroot test < dump_optimized.sql real 1m20.958s user 0m3.204s sys 0m0.062s |
Caveats:
As I mentioned previously, InnoDB fast index creation uses temporary files in tmpdir
for all indexes being created. So make sure you have enough tmpdir
space when using expand_fast_index_creation
. It is a session variable, so you can temporarily switch it off if you are short on tmpdir
space and/or don’t want this optimization to be used for a specific table.
There’s also a number of cases when this optimization is not applicable:
UNIQUE
indexes inALTER TABLE
are ignored to enforce uniqueness where necessary when copying the data to a temporary table;ALTER TABLE
andOPTIMIZE TABLE
always process tables containing foreign keys as ifexpand_fast_index_creation
isOFF
to avoid dropping keys that are part of aFOREIGN KEY
constraint;mysqldump --innodb-optimize-keys
ignores foreign keys because InnoDB requires a full table rebuild on foreign key changes. So adding them back with a separateALTER TABLE
after restoring the data from a dump would actually make the restore slower;mysqldump --innodb-optimize-keys
ignores indexes onAUTO_INCREMENT
columns, because they must be indexed, so it is impossible to temporarily drop the corresponding index;mysqldump --innodb-optimize-keys
ignores the firstUNIQUE
index on non-nullable columns when the table has noPRIMARY KEY
defined, because in this case InnoDB picks such an index as the clustered one.
Alexey,
Thank you for your work on this feature and now blog post. Great to notice not only this feature allows to optimize/load tables faster but also tables end up to have smaller more efficient indexes.
Hi Alexey
expand_fast_index_creation sys var doesnt exist on 5.5.12…
your blog post says >=5.5.11
am I missing something?
Hi Steve,
You are right. Even though the functionality was introduced in 5.1.56 and 5.5.11, it became optional only in 5.1.59 and 5.5.16. That is, in 5.5.12 it is always enabled.
Alexey.. thanks for the clarification.
//Steve