UNION vs UNION ALL Performance

When I was comparing performance of UNION vs MySQL 5.0 index merge algorithm Sinisa pointed out I should be using UNION ALL instead of simple UNION in my benchmarks, and he was right. Numbers would be different but it should not change general point of having optimization of moving LIMIT inside of union clause being cool thing.

But So is UNION ALL indeed faster than UNION DISTINCT (the UNION is shortcut for UNION DISTINCT) ?

Indeed it is. I did not have the same data as I used for the other test but I created similar test case – table with separate indexes on “a” and “b” columns with cardinality of 100, having about 40.000.000 of rows

select * from test.abc where i=5 union  select * from test.abc where j=5

1	select * from test.abc where i=5 union select * from test.abc where j=5

This original query was taking about 22 seconds.

As I modified it:

select * from test.abc where i=5 union all select * from test.abc where j=5 and i!=5

1	select * from test.abc where i=5 union all select * from test.abc where j=5 and i!=5

The query time dropped to about 6 seconds which is 3.5 times faster – quite considerable improvement.

As you can notice I added “i!=5” clause – this is what allows us to ensure we do not have duplicate rows in result set matching both conditions and so result will be same as query with “i=5 or j=5” where clause.

I also tried this original query (which uses index merge method in MySQL 5.0):

select * from test.abc where i=5 or j=5

1	select * from test.abc where i=5 or j=5

Such query takes 4 seconds so if you do not need to trick with order by and limit using index merge is faster than UNION as it indeed should be.

So why UNION ALL is faster than UNION DISTINCT ?

The first informed guess would be – because UNION ALL does not need to use temporary table to store result set, however this is not correct – both UNION ALL and UNION distinct use temporary table for result generation. Perhaps one more thing for Optimizer Team to look into.

Interesting enough the fact UNION and UNION ALL require temporary table can only be seen in SHOW STATUS – EXPLAIN does not want to tell you this shameful fact:

mysql> explain (select * from test.abc where i=5) union all (select * from test.abc where j=5 and i!=5) \G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: abc
         type: ref
possible_keys: i
          key: i
      key_len: 5
          ref: const
         rows: 348570
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: UNION
        table: abc
         type: ref
possible_keys: i,j
          key: j
      key_len: 5
          ref: const
         rows: 349169
        Extra: Using where
*************************** 3. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: <union1,2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra:
3 rows in set (0.00 sec)

mysql> explain (select * from test.abc where i=5) union all (select * from test.abc where j=5 and i!=5) \G

*************************** 1. row ***************************

id: 1

select_type: PRIMARY

table: abc

type: ref

possible_keys: i

key: i

key_len: 5

ref: const

rows: 348570

Extra: Using where

*************************** 2. row ***************************

id: 2

select_type: UNION

table: abc

type: ref

possible_keys: i,j

key: j

key_len: 5

ref: const

rows: 349169

Extra: Using where

*************************** 3. row ***************************

id: NULL

select_type: UNION RESULT

table: <union1,2>

type: ALL

possible_keys: NULL

key: NULL

key_len: NULL

ref: NULL

rows: NULL

Extra:

3 rows in set (0.00 sec)

In fact EXPLAIN output is the same for UNION and UNION ALL (which is too bad as execution for them is obviously different).

The difference in execution speed comes from the fact UNION requires internal temporary table with index (to skip duplicate rows) while UNION ALL will create table without such index.

This also explains why difference becomes larger when on disk table is required (as in this case) – Hash indexes used by MEMORY table are very efficient and do not give so much overhead.

13 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Xaprb

16 years ago

I read the source for UNION once (Mark Leith pointed me to it). It’s not very much code. The only difference between the two is the temporary table has the “distinct” property set, as I recall. So of course that means a unique index on all columns, which is bound to be slow.

We should mention this in our book… it’s such an instinct for me to use UNION ALL that I forgot about it. I’ll go look and see if we have mentioned it anywhere.

Peter Zaitsev

Author

16 years ago

Baron,

I think the most important thing here is the fact even UNION ALL uses temporary table, while it could simply be sending result sets one after another in many cases, possibly with little conversion to adjust data types.

The unique index is in fact not index on all columns but some form of hash index – with MyISAM key limit index on all columns would not work for tables with long rows not to mention BLOBs

Scott Marlowe

16 years ago

Union All is generally faster than union because it doesn’t have to do a sort / unique step.

With union all, you smoosh two data sets together not caring if there are dups or not. union (distinct) has to then go that extra step to remove dups.

TANSTAAFL

Peter Zaitsev

Author

16 years ago

Hey Scott,

This is what I mentioned too. The point was not to see if it is faster but to get the measure of the thing in numbers.

ramasubramanian.G

16 years ago

sldfj

Sebastian Gomez Morales

16 years ago

great article! (as usual)
thanks a lot.
i was looking for this and this text was very helpful
keep the good work, boys

Charu

15 years ago

Hi,

Can someone answer my related query posted in the mysql performance forum.Query can be found here:

http://forum.mysqlperformanceblog.com/s/mv/tree/743/

Thanks.

art

14 years ago

Hey, thanks for the article, helped me also.

Kiran

16 years ago

Is UNION on same table going to make some lock or create more connection to database?

hic

12 years ago

Hi,

I have a query like
1)select distinct(col) from(select distinct a col from tab_a
union all
select distinct b col from tab_b
union all
select distinct c col from tab_c
union all
select distinct d col from tab_d
)

2)select distinct(col) from(select distinct a col from tab_a
union all
select b col from tab_b
union all
select c col from tab_c
union all
select d col from tab_d
)

3)select distinct(col) from(select distinct a col from tab_a
union
select distinct b col from tab_b
union
select distinct c col from tab_c
union
select distinct d col from tab_d
)

I thought like option 2 will be better to achieve this but the explain plan shows less cost for option 1. Can anybody tell me which option will be better.

mohamat167

11 years ago

Thanks Peter, it is useful for me.

Rahul

11 years ago

suppose register_num = 1005 , register_num=2000 are present in the table.
when i query using union i get the two rows as result wherein at result 0 i will have register_num = 1005 details and at result 1 i will have register_num=2000 details.
Here I want to know if it is possible to get the details of register_num=2000 at result 1 when register_num=1005 is not present in DB. and get the details of register_num=1005 at result 0 when register_num=2000 is not present in DB

Kebba Foon

10 years ago

Peter this article is a life saver, i never ever bother to check the union and union all statements always stick with union untill recently while i was working on a project, have to query sets everything works fine and when set to union different results. for the life of me could never understood why, i even concidered re-writting part my application code to do two seperate query – that will have cause me a lot of time and effort. Thanks P for saving me.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

UNION vs UNION ALL Performance

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Configuration Best Practices

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

UNION vs UNION ALL Performance

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Configuration Best Practices

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation