How the number of columns affects performance

It is pretty understood the tables which have long rows tend to be slower than tables with short rows. I was interested to check if the row length is the only thing what matters or if the number of columns we have to work with also have an important role. I was interested in peak row processing speed so I looked at full table scan in case data fits in OS cache completely. I created 3 tables – First containing single tinyint column which is almost shortest type possible (CHAR(0) could be taking less space), table with 1 tinyint column and char(99) column and a table with 100 tinyint columns. The former two tables have the same row length but have the number of column different 50 times. Finally, I have created a 4th table which is also 100 columns but one of them is VARCHAR causes raw format to be dynamic.

More especially:

 CREATE TABLE `t1` (
  `t1` tinyint(3) unsigned NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t1` (

`t1` tinyint(3) unsigned NOT NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t1c99` (
  `t1` tinyint(3) unsigned NOT NULL,
  `c99` char(99) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t1c99` (

`t1` tinyint(3) unsigned NOT NULL,

`c99` char(99) NOT NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t100` (
  `t1` tinyint(3) unsigned NOT NULL,
  `t2` tinyint(3) unsigned NOT NULL,
...
  `t99` tinyint(3) unsigned NOT NULL,
  `t100` tinyint(3) unsigned NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t100` (

`t1` tinyint(3) unsigned NOT NULL,

`t2` tinyint(3) unsigned NOT NULL,

...

`t99` tinyint(3) unsigned NOT NULL,

`t100` tinyint(3) unsigned NOT NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t99v1` (
  `t1` tinyint(3) unsigned NOT NULL,
  `t2` tinyint(3) unsigned NOT NULL,
...
  `t99` tinyint(3) unsigned NOT NULL,
  `v1` varchar(1) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t99v1` (

`t1` tinyint(3) unsigned NOT NULL,

`t2` tinyint(3) unsigned NOT NULL,

...

`t99` tinyint(3) unsigned NOT NULL,

`v1` varchar(1) NOT NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1

I populated each of the tables with 12M rows. Getting 7 bytes row size for the first table and 101 bytes for second and third.
I used a simple scan query: select max(t1) from t100; for the test.

The result was as follows:
t1 – 1.00 sec (12M rows/sec ; 80MB/sec)
t1c99 – 1.71 sec (7M rows/sec ; 676MB/sec)
t100 – 1.77 sec (7M rows/sec ; 653MB/sec)
t99v1 – 12.36 sec (1M rows/sec ; 93MB/sec)

This shows there is surely the problem with dynamic row format table with many columns. But is it because of large number of columns or dynamic format on its own is slave ?
I have tested yet another table structure:

CREATE TABLE `t1v1` (
  `t1` tinyint(3) unsigned NOT NULL,
  `v` varchar(1) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `t1v1` (

`t1` tinyint(3) unsigned NOT NULL,

`v` varchar(1) NOT NULL

) ENGINE=MyISAM DEFAULT CHARSET=latin1

This table has a row length of 20 (which was a bit of surprise to me) and it has:

t1v1 – 1.83 sec (6.5M rows/sec; 125M/sec)

So there is surely the penalty for dynamic rows, however, it is not very significant if the number of columns is small. For a large number of columns, dynamic rows become very expensive and you have to watch out.

I have not looked at the code and would appreciate any developers comments but I guess for dynamic rows tables certain conversion has to take place when internal data structures are populated (everything but TEXTs/BLOBs is fixed length when it is being processed). This conversion process depends on a number of columns while for fixed rows the MyISAM storage format matches internal one so you can basically do memory copy which does not depends on the number of columns.

Another interesting observation is access speed to different columns. the max(t1) and max(t99) were taking the same time which means there is no penalty for accessing column which is at the end of the table rather than at the start when it comes to MyISAM.

The common workaround working with such wide tables is to use covering indexes. I added one to t99v1 table and repeated the query:

mysql [localhost] {msandbox} (test) > select max(t1+0) from t99v1;
+-----------+
| max(t1+0) |
+-----------+
|         0 |
+-----------+
1 row in set (3.26 sec)

mysql [localhost] {msandbox} (test) > explain select max(t1+0) from t99v1;
+----+-------------+-------+-------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+-------+-------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | t99v1 | index | NULL          | t1   | 1       | NULL | 12000000 | Using index |
+----+-------------+-------+-------+---------------+------+---------+------+----------+-------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > select max(t1+0) from t99v1;

+-----------+

| max(t1+0) |

+-----------+

| 0 |

+-----------+

1 row in set (3.26 sec)

mysql [localhost] {msandbox} (test) > explain select max(t1+0) from t99v1;

+----+-------------+-------+-------+---------------+------+---------+------+----------+-------------+

+----+-------------+-------+-------+---------------+------+---------+------+----------+-------------+

| 1 | SIMPLE | t99v1 | index | NULL | t1 | 1 | NULL | 12000000 | Using index |

+----+-------------+-------+-------+---------------+------+---------+------+----------+-------------+

1 row in set (0.00 sec)

As you can see the index scan is not as fast as table scan scanning about 3.7M rows/sec but which is still pretty fast.

So this is all about MyISAM, what is about Innodb? Here are results for Innodb with all data in the buffer pool, to measure peak speed as well

The results for Innodb were:
t1 – 5.11 sec (2.3M rows/sec)
t1c99 – 5.74 sec (2.1M rows/sec)
t100 – 15.16 sec (0.8M rows/sec)
t99v1 – 14.93 sec (0.8M rows/sec)
t1v1 – 5.26 sec (2.3M rows/sec)
t99v1 (covering idx) – 5.62 sec (2.1M rows/sec)

As you can see Innodb is a lot slower and has a behavior similar to Dynamic Row tables in both cases. This is because Innodb does not store data in native MyISAM format and conversion is needed in all cases. We can also see the table scan speed can be up to 5 times slower, for very short rows – some of this goes back to the fact Innodb rows have a lot of transaction control overhead attached to them.

Also note the covering index scan speed is very similar to full table scan speed – this is rather expected as table data is stored in BTREE index very similarly to how indexes are stored.

Summary: Beware of dynamic row format tables with many columns they might bite you with surprise slowdown. MyISAM is much faster than Innodb when it comes to in-memory full table scan.

P.S Tests were done on MySQL 5.4.2 on Intel(R) Xeon(R) CPU E5405 @ 2.00GHz CPU.

15 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Matt

14 years ago

“As you can see MyISAM is a lot slower and has behavior…” – I think you meant to say InnoDB there 🙂

Peter Zaitsev

Author

14 years ago

Matt,

Thanks corrected.

Eli

14 years ago

I had been meaning to test this issue (how the number of columns affect the performance) for quite some time now – and I thank you for saving me the trouble 🙂

Eli

14 years ago

Oh, and by the way – your CAPTCHA has a problem with calculating 9 + 6 …
To my knowledge, it’s 15, but it seems your CAPTCHA thinks otherwise 🙂

Peter Zaitsev

Author

14 years ago

Eli,

Sorry about captcha. Looks like some other glitch. What kind of error message did you get if any ?

Gunnar

14 years ago

Hi Peter,

Thanks for the benchmark.
I wonder if the test is comparing apples to oranges by accident.

The 4 test tables that you use have very different sizes.
How much influence has the different size of the test tables on the results?
Wouldn’t using tables with equal row size give a better to compare result?

Take care

Peter Zaitsev

Author

14 years ago

Gunnar,

This is why I specify MB/sec too 🙂

The sizes are only different when I want them to be. t1c99, t100 and t99v1 all have same row length in MyISAM and yet they have different scan speed. Other tables are of different row length and mainly provided for comparison so we can both see how number and type of columns affect scan speed.

Gunnar

14 years ago

Hi Peter,

>>Wouldnâ€™t using tables with equal row size give a better to compare result?

>This is why I specify MB/sec too
>
>t1 – 1.00 sec (12M rows/sec ; 80MB/sec)
>t1c99 – 1.71 sec (7M rows/sec ; 676MB/sec)
>t100 – 1.77 sec (7M rows/sec ; 653MB/sec)
>t99v1 – 12.36 sec (1M rows/sec ; 93MB/sec)

I see. 🙂

You said:
> So there is surely the penalty for dynamic rows, however it is not very significant
> if number of columns is small. For large number of columns dynamic rows become very expensive and you have to watch out.

I fully agree with your conclusion that there is a penalty for “dynamic” rows.
Looking at your numbers we see the result of 653MB/sec versus 93MB/sec.
Which means that a static row is 7 times faster than a dynamic row.

But I’m not sure if the number of columns is of any relevance or if its the row length.
I would assume that a 2 column but longer row should get a huge penalty also.

Could you be so kind and test what result are getting for a table like “t1v99”?
CREATE TABLE t1v99 (
t1 tinyint(3) UNSIGNED NOT NULL,
v99 varchar(99) NOT NULL DEFAULT ‘12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789’
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Could we compare this result with “tac99” please?

Cheers
Gunnar

Eli

14 years ago

In reply to #5: The message is: “You have failed the challenge!” (it happened again right now, when I tried to answer: 8 + 0 = 8)

jcn50

13 years ago

>t1 â€“ 1.00 sec (12M rows/sec ; 80MB/sec)
>t1c99 â€“ 1.71 sec (7M rows/sec ; 676MB/sec)

I think there is a typo on the first line, no? It should be:
>t1 â€“ 1.00 sec (12M rows/sec ; __ 800MB__ /sec)

George

11 years ago

I am designing a django site where there are 12 columns. Each of these columns needs to be searched together like where category=animals and orientation=landscape and etc etc because I am providing a search page with different dropdowns of each of these 12 options. Now the problem is I am a beginner with this and I am expecting the database to grow to millions of rows in future. The database inserts are less in the site max 500/day. Should I stay with 12 columns model or should I split it. If I search like I said above will it be cause performance drops. All the columns are charfield < 30 chars.

Jim Mansir

10 years ago

Peter,

Do you think it would it be more efficient to have multiple ( 7-10 ) different tables and minimize the record count in each table or would it be faster to add a column to differentiate between my types of data and put them all in the same table. The only difference is the data type…all the rest of the columns would be the same. There would be 15 columns of data in the multiple column scenario and 16 if I had to add a data type column. We are probably looking at 2 million records total.

Thanks!

Jim.

Sudhakar Narra

10 years ago

thanks for explaining this 🙂

Ulrik Lystbæk

10 years ago

Im not an expert on the subject, but is the penality not the result of the decrease in the fanout. As the number of columns grows, each row will get larger, and the fanout of the table structure is usally depended on the pagesize whereas lesser fanout. This could be solved with creating an index of the columns you need, which will create a BTree with a fanout only affected by the needed columns (columns in the index). Like I said I am not an expert on field and havent tested if the new index would be affected by the column count in the table.

Ramesh Naidu

9 years ago

Hi Friends,

I want save product with more than 25 variants in the table. so want add 25 columns in Table. i will store in that with Reference Key( Auto increment Value) . Please suggest . it any effect on the Performance

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How the number of columns affects performance

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Configuration Best Practices

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How the number of columns affects performance

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Configuration Best Practices

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation