April 23, 2008

Efficient Boolean value storage for Innodb Tables

Posted by peter

Sometimes you have the task of storing multiple of boolean values (yes/now or something similar) in the table and if you get many columns and many rows you may want to store them as efficient way as possible.
For MyISAM tables you could use BIT(1) fields which get combined together for efficient storage:

SQL:
  1. CREATE TABLE `bbool` (
  2.   `b1` bit(1) NOT NULL,
  3.   `b2` bit(1) NOT NULL,
  4.   `b3` bit(1) NOT NULL,
  5.   `b4` bit(1) NOT NULL,
  6.   `b5` bit(1) NOT NULL,
  7.   `b6` bit(1) NOT NULL,
  8.   `b7` bit(1) NOT NULL,
  9.   `b8` bit(1) NOT NULL,
  10.   `b9` bit(1) NOT NULL,
  11.   `b10` bit(1) NOT NULL
  12. ) ENGINE=MyISAM
  13.  
  14. mysql> SHOW TABLE STATUS LIKE 'bbool' \G
  15. *************************** 1. row ***************************
  16.            Name: bbool
  17.          Engine: MyISAM
  18.         Version: 10
  19.      Row_format: Fixed
  20.            Rows: 10
  21.  Avg_row_length: 7
  22.     Data_length: 70
  23. Max_data_length: 1970324836974591
  24.    Index_length: 1024
  25.       Data_free: 0
  26.  AUTO_INCREMENT: NULL
  27.     Create_time: 2008-04-24 00:41:01
  28.     Update_time: 2008-04-24 00:45:40
  29.      Check_time: NULL
  30.       Collation: latin1_swedish_ci
  31.        Checksum: NULL
  32.  Create_options:
  33.         Comment:
  34. 1 row IN SET (0.00 sec)

As you can see for MyISAM 10 columns take just 7 bytes - less than a byte per column. This is just minimum row length we can have for this table - myisam_data_pointer_size is 6 default plus we need space for delete flag which makes 7 minimum row size MyISAM can have in this configuration.

This trick however does not work for Innodb which allocates 1 byte for each BIT(1) column. So we can get 1 byte per column for boolean flag storage in Innodb (not accounting for standard row overhead) if we use BIT(1), TINYINT or ENUM types but can we do better ?

In fact we can - by using CHAR(0) type (without NOT NULL flag) - this will be pretty much column containing NULL bit only which can store one of two values - NULL or Empty String.

Lets see how these 3 different table format look in Innodb (I've populated each with some 2M rows so difference is more visible)

SQL:
  1. CREATE TABLE `tbool` (
  2.   `t1` tinyint(4) NOT NULL,
  3.   `t2` tinyint(4) NOT NULL,
  4.   `t3` tinyint(4) NOT NULL,
  5.   `t4` tinyint(4) NOT NULL,
  6.   `t5` tinyint(4) NOT NULL,
  7.   `t6` tinyint(4) NOT NULL,
  8.   `t7` tinyint(4) NOT NULL,
  9.   `t8` tinyint(4) NOT NULL,
  10.   `t9` tinyint(4) NOT NULL,
  11.   `t10` tinyint(4) NOT NULL
  12. ) ENGINE=InnoDB
  13.  
  14. CREATE TABLE `cbool` (
  15.   `c1` char(0) DEFAULT NULL,
  16.   `c2` char(0) DEFAULT NULL,
  17.   `c3` char(0) DEFAULT NULL,
  18.   `c4` char(0) DEFAULT NULL,
  19.   `c5` char(0) DEFAULT NULL,
  20.   `c6` char(0) DEFAULT NULL,
  21.   `c7` char(0) DEFAULT NULL,
  22.   `c8` char(0) DEFAULT NULL,
  23.   `c9` char(0) DEFAULT NULL,
  24.   `c10` char(0) DEFAULT NULL
  25. ) ENGINE=InnoDB DEFAULT CHARSET=latin1
  26.  
  27. mysql> SHOW TABLE STATUS LIKE "%bool%" \G
  28. *************************** 1. row ***************************
  29.            Name: bbool
  30.          Engine: InnoDB
  31.         Version: 10
  32.      Row_format: Compact
  33.            Rows: 2097405
  34.  Avg_row_length: 37
  35.     Data_length: 78233600
  36. Max_data_length: 0
  37.    Index_length: 0
  38.       Data_free: 0
  39.  AUTO_INCREMENT: NULL
  40.     Create_time: 2008-04-24 00:54:18
  41.     Update_time: NULL
  42.      Check_time: NULL
  43.       Collation: latin1_swedish_ci
  44.        Checksum: NULL
  45.  Create_options:
  46.         Comment: InnoDB free: 6144 kB
  47. *************************** 2. row ***************************
  48.            Name: cbool
  49.          Engine: InnoDB
  50.         Version: 10
  51.      Row_format: Compact
  52.            Rows: 2097678
  53.  Avg_row_length: 34
  54.     Data_length: 71942144
  55. Max_data_length: 0
  56.    Index_length: 0
  57.       Data_free: 0
  58.  AUTO_INCREMENT: NULL
  59.     Create_time: 2008-04-24 00:37:48
  60.     Update_time: NULL
  61.      Check_time: NULL
  62.       Collation: latin1_swedish_ci
  63.        Checksum: NULL
  64.  Create_options:
  65.         Comment: InnoDB free: 4096 kB
  66. *************************** 3. row ***************************
  67.            Name: tbool
  68.          Engine: InnoDB
  69.         Version: 10
  70.      Row_format: Compact
  71.            Rows: 2097405
  72.  Avg_row_length: 37
  73.     Data_length: 78233600
  74. Max_data_length: 0
  75.    Index_length: 0
  76.       Data_free: 0
  77.  AUTO_INCREMENT: NULL
  78.     Create_time: 2008-04-24 00:58:01
  79.     Update_time: NULL
  80.      Check_time: NULL
  81.       Collation: latin1_swedish_ci
  82.        Checksum: NULL
  83.  Create_options:
  84.         Comment: InnoDB free: 6144 kB
  85. 3 rows IN SET (0.11 sec)

As you can see table which uses BIT(1) column type takes same space as the one which uses TINYINT NOT NULL while CHAR(0) is about 10% smaller. This is modest space savings of course but considering large per row overhead Innodb has this will transform to much larger savings if you have hundreds of such columns.

Lets see how things look for MyISAM for same tables:

SQL:
  1. mysql> SHOW TABLE STATUS LIKE "%bool%" \G
  2. *************************** 1. row ***************************
  3.            Name: bbool
  4.          Engine: MyISAM
  5.         Version: 10
  6.      Row_format: Fixed
  7.            Rows: 2097152
  8.  Avg_row_length: 7
  9.     Data_length: 14680064
  10. Max_data_length: 1970324836974591
  11.    Index_length: 1024
  12.       Data_free: 0
  13.  AUTO_INCREMENT: NULL
  14.     Create_time: 2008-04-24 01:14:06
  15.     Update_time: 2008-04-24 01:14:09
  16.      Check_time: NULL
  17.       Collation: latin1_swedish_ci
  18.        Checksum: NULL
  19.  Create_options:
  20.         Comment:
  21. *************************** 2. row ***************************
  22.            Name: cbool
  23.          Engine: MyISAM
  24.         Version: 10
  25.      Row_format: Fixed
  26.            Rows: 2097152
  27.  Avg_row_length: 7
  28.     Data_length: 14680064
  29. Max_data_length: 1970324836974591
  30.    Index_length: 1024
  31.       Data_free: 0
  32.  AUTO_INCREMENT: NULL
  33.     Create_time: 2008-04-24 01:14:13
  34.     Update_time: 2008-04-24 01:14:17
  35.      Check_time: NULL
  36.       Collation: latin1_swedish_ci
  37.        Checksum: NULL
  38.  Create_options:
  39.         Comment:
  40. *************************** 3. row ***************************
  41.            Name: tbool
  42.          Engine: MyISAM
  43.         Version: 10
  44.      Row_format: Fixed
  45.            Rows: 2097152
  46.  Avg_row_length: 11
  47.     Data_length: 23068672
  48. Max_data_length: 3096224743817215
  49.    Index_length: 1024
  50.       Data_free: 0
  51.  AUTO_INCREMENT: NULL
  52.     Create_time: 2008-04-24 01:14:23
  53.     Update_time: 2008-04-24 01:14:26
  54.      Check_time: NULL
  55.       Collation: latin1_swedish_ci
  56.        Checksum: NULL
  57.  Create_options:
  58.         Comment:
  59. 3 rows IN SET (0.00 sec)

As you can see for MyISAM BIT(1) NOT NULL type is as compact as CHAR(0) while TINYINT NOT NULL is a bit less compact.

Looking at results of these tests using CHAR(0) is the most efficient if you would like optimal structure both for MyISAM and Innodb tables, however it is not as convenient to work with. Using NULL as one of flag values means you can't use normal "=" comparison operator with them:

SQL:
  1. mysql> SELECT count(*) FROM cbool WHERE c1=NULL;
  2. +----------+
  3. | count(*) |
  4. +----------+
  5. |        0 |
  6. +----------+
  7. 1 row IN SET (0.20 sec)

You can use IS NULL operator which is painful because you need to have different query based on parameter (IS '' would not work) or you can use Null-Aware comparison operator:

SQL:
  1. mysql> SELECT count(*) FROM cbool WHERE c1<=>NULL;
  2. +----------+
  3. | count(*) |
  4. +----------+
  5. 1048576 |
  6. +----------+
  7. 1 row IN SET (0.22 sec)

Should you go and change all flags to use this approach ? I do not think so - for most applications using TINYINT BIT(1) or ENUM for flags benefit would unlikely be worth the trouble. Due to complication I also would not recommend as a base approach for new applications. However in special cases if you have very many rows and very many flag values to deal with which you can't pack to the bitmask this approach can be quite helpful.

Related posts: :Full text search for all MySQL Storage Engines::FaceBook Search, Search for social networks::MySQL Error control changes:
 

4 Comments »

  1. I usually prefer having one single column named `flags`, and use all it’s bits, one for each boolean. The logic of handling these boolean values can lie either in the program or a mysql function. Size of the column `flags` can be determined based on the number of such boolean value.

    I know this go against the readability of the database, but is good enough if the matter comes to storage.


    Parvesh

    Comment :: April 24, 2008 @ 3:07 am

  2. Parvesh,

    Sure. If you can pack bits to the bitmask this will be the most efficient as I mention in the end of the article. Though it does not work in all cases.

    Comment :: April 24, 2008 @ 7:51 am

  3. 3. MSDI

    I’ve read your comment about the Char trick.

    It’s interesting, but I don’t someone should use such thing on a server. It renders the code less readable

    WHERE
    bIsActive = ” //This means true

    WHERE
    bIsActive IS NULL //This means false

    Also, your trick about the bitmask is fine, but inapropriate if you need to search there values

    bIsActive = 1;
    bIsLocked = 2;
    bIsConstipated = 4;

    Someone that has the 3 flags on would be represented as 0000111 (7) in the database, but if you want to extract the list of active people, you’ll need a table scan.

    Fine for some use, but if you need index on them, it has a downside…

    Comment :: September 4, 2008 @ 8:15 am

  4. Sure you’re right. It is very inconvenient for query generation. The bitmap… sure you get searches more complicated and indexes do not work. The more friendly way of bitmaps is SET type though it is hard to alter if you need new bits.

    Comment :: September 4, 2008 @ 10:34 am

 

Subscribe without commenting


This page was found by: boolean myisam creating tables with... mysql create_time in... set default innodb t... mysql create bool