MySQL INSERT ON DUPLICATE KEY UPDATE and counters

MySQL INSERT ON DUPLICATE KEY UPDATE is very powerful but often forgotten MySQL feature. It was introduced in MySQL 4.1 but I still constantly see people unaware of it.

Myself I like this feature big deal because it is designed in truly MySQL style – very efficient solution for frequent task while keeping it beautiful and easy to use.

So what is the MySQL INSERT … ON DUPLICATE KEY UPDATE great for? Well any kind of maintaining counters. If you’re writing traffic accounting it could be traffic and number of packet passed for given port or IP address. For Web Applications it could be counting number of visits per page or IP address, number of times particular keyword was searched etc.

This functionality also makes it very easy to do incremental single pass log file processing and building summary tables.

Here is example:

create table ipstat(ip int unsigned not null primary key, 
                          hits int unsigned not null, 
                          last_hit timestamp);

insert into ipstat values(inet_aton('192.168.0.1'),1,now()) 
                       on duplicate key update hits=hits+1;

create table ipstat(ip int unsigned not null primary key,

hits int unsigned not null,

last_hit timestamp);

insert into ipstat values(inet_aton('192.168.0.1'),1,now())

on duplicate key update hits=hits+1;

This example actually shows one more neat feature of MySQL – inet_aton and inet_ntoa functions which can convert IP address strings to integers and back. This allows to save on field length significantly by using 4 bytes instead of 15

The third feature this example takes advantage of is TIMESTAMP field. By default first TIMESTAMP column will have its value automatically updated to current timestamp on insert and update. We actually could have omitted now() in insert clause but this would require to specify list of columns which we skipped for sake of example.

So how would this example work ? Well just you would expect it. If there is no such IP address in the table it will be added with hits=1 if it is already where (note ip is PRIMARY KEY) it would be just incremented and last visit timestamp updated.

The benefit of using MySQL INSERT … ON DUPLICATE KEY UPDATE instead of INSERT + UPDATE could be different, depending on number of new rows and data set size. 30% speedup should be typical. Performance increase is not the only benefit – what is even more important the application code becomes simplier – less error prone and easy to read.

41 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

sunxiunan

17 years ago

MySQL is a complete DataBase Server now. But so many people only think it as simple as before.
Such as it didn’t have transaction process, or don’t support many features.
I think it is important that study the release notes about it to know how we can use it effectively.

Peter Zaitsev

Author

17 years ago

Yes. There are many features being added which might be hard to track. I also would not always encourage all of them just for sake of using performance. Many of new features are complex – for example Stored Procedures. If you start using stored procedures in your code you might need to ensure other members of the team are also fluent with them. INSERT ON DUPLICATE KEY UPDATE is of the different sort – there is not much to learn about it – you can use it safely in your code. If someone does not know about it it takes to read single page to learn.

Paul Wilson

17 years ago

I’ve done a lot of work with traffic databases, and others with IP addresses and one thing appeared very quickly – if possible store the IP address both as an unsigned int as as a char(15). The conversion functions aren’t fast iterated over millions of rows.

Peter Zaitsev

Author

17 years ago

Paul,

I’m actually surprised with your finding. Unless you’re doing indexing and going to do something like where IP like “192.168.%” there should not be serious benefit to store both versions. Yes you save on function call but typically it is not serious response time contributor. You however waste space and especially if you do inexing on this field you end up with larger and slower indexes.

Of course it depends a lot on workload. For example if you do full web request logging and store page, referer, user agent, cookie information you might not save a lot of space by using integer ip

Apachez

17 years ago

But in the case of “IP like ‘192.168.%'” you could use “IP BETWEEN 3232235520 AND 3232301055” if you have stored ips as unsigned int which also should be faster I guess to process I guess…

Peter Zaitsev

Author

17 years ago

Apachez,

Yes you’re right of course. I used wrong example to ilustrate an idea. like “%192.168.%.255” would be better example. And yes you still can use inet_ntoa and apply extra filtering on result. It is just matter of convenience 🙂

Especially in log tables where you might want to run some adhoc queries manually ease of use might be more important than couple of percent performance gain.

Thanks for correction again.

Erick

16 years ago

ON DUPLICATE KEY is the same as an UPDATE command. It is merely a simpler (and confusing) way of doing things. In a typical MySQL way, when this does get invoked, it returns “2 Affected Rows”. Nice. How it is a really effective feature is lost on me —

OLD WAY:
1. Insert
2. If error, then UPDATE.

NEW WAY:
1. Insert
2. Engine discovers if it already exists (through internal error), so UPDATE

No different.

As for inet ip address conversion functions, it is still much more recommended to use the functionality provided by your programming language. In the case of PHP for instance, ip2long function will do this for you: http://php.net/ip2long – you can do a simple loop microtime test to see some STARK and REAL difference in performance when you get to medium sized data.

Peter Zaitsev

Author

16 years ago

Erick,

The fact it returns 2 rows affected if only one really was updated is probably a bug. Please report it.

I can’t say much about being confused by this feature. I think it is rather straightforward – insert data and if it is where update it.

Now from performance standpoint the difference is in number of statements – for in memory data sets going over network and parsing query is what is taking most of time, this is why having it in single statement helps to improve performance.

16 years ago

SICK! This should save me some sql calls.

-1

Marki

16 years ago

Peter: It is not a bug, it is a feature (documented one).
When row is inserted, report it as 1 affected row. When it is updated, report it as 2 affected rows.

Peter Zaitsev

Author

16 years ago

Thanks Maki,

Even though I find it counter intuitive. May be similary to update statement using rows_matched to see the rows which were already in the table would be better idea.

Peter Romianowski

16 years ago

A short note on performance: If you use this feature in order to aggregate reports or something like that (i.e. you have a low insert but high update rate) a strategie like:

try {
UPDATE
} catch (exception) {
INSERT INTO
}

might be better. I notices a speed up of up to 15% using this strategie instead of ON DUPLICATE KEY UPDATE. The higher your actual insert rate becomes the lower is the benefit until it performs worse. As always: Test your application 😉

Lars Strojny

16 years ago

The problem with inet_aton and inet_ntoa is, that IPv6 is just not supported. So you will end up implementing inet_aton6 by hand (which is tricky) or using varchars again.

Eraser

16 years ago

Hey,
thanks for this post! I searched for a function like this!

I love MySQL! Very powerfull!

Regards,
E

foo

16 years ago

also useful in combination with unique keys 🙂
see http://textsnippets.com/posts/show/975

16 years ago

Is there any way i can use a query similar to this but have some kind of where clause at the end? say if i wanted to do a whole bunch of inserts but update them if they exist already and update the primary key also? so for example
insert into tmmember values (‘STRES17N’,’SK’,’1725′,’ESTRES17′,’N’,’6′,’255′,’-1′)
on duplicate key update memb_memberid = ‘STRES17N’, memb_teamid = ‘SK’ only the original memb_member id might be ‘SKRES17N’ so as you can see id like to be able to say where memb_memberid = ‘SKRES17N’ for the cases when its true.

dreamluverz

15 years ago

I’m trying to use this on INSERT SELECT statement and im getting an error. my sql is something like this

INSERT INTO table SELECT * FROM table WHERE id=’2′ ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID() + 1
I use LAST_INSERT_ID() coz im using auto increment.

Peter Zaitsev

Author

15 years ago

Dreamluverz,

The best place to ask the question is forums.mysqlperformanceblog.com ; and it is also best to provide exact statement and error message 🙂

Sergio

15 years ago

I’ve tried this but I didn’t notice any performance improvement.

The UPDATE rate is higher than the INSERT rate.

So.. peter, from your experience, what’s the FASTEST WAY from the following:

1) This method I’ve been using until found out about this article. First, we check if there is a record.. then.. see there.

mysql_query(“SELECT COUNT(*) as count FROM table WHERE user_id = ‘1’”);
$cnt = mysql_fetch_array();
$cnt = $cnt[‘count’];

if ($cnt == 0) {
mysql_query(“INSERT INTO table VALUES (‘1’, ‘1’)”);
}
else {
mysql_query(“UPDATE table SET user_hits = user_hits+1 WHERE user_id = ‘1’”);
}

2) With inspiration from Peter Romianowski’s post. Haven’t tested, but I guess I’ve wrote the code ok.

mysql_query(“UPDATE table SET user_hits = user_hits+1 WHERE user_id = ‘1’”);
$affected_rows = mysql_affected_rows();

if ($affected_rows == 0) mysql_query(“INSERT INTO table VALUES (‘1’, ‘1’)”);

3) Using INSERT .. ON duplicate KEY UPDATE ..

mysql_query(“INSERT INTO table VALUES (‘1’, ‘1’) ON duplicate KEY UPDATE user_hits = user_hits+1 “);

So.. which one is recommended? I repeat.. my update rate is higher than the insert rate. I guess method 2. But I’m waiting other opinions.
Thanks.

Peter Romianowski

15 years ago

Sergio, I would recommend version 2 for high update rates. But I would also recommend to benchmark your application in high concurrency environment. 😉

Cheers

Peter

Sergio

15 years ago

Thanks for reply Peter.

However, version 2 has a small issue.
If you try to update a record with the same data as in table, mysql ignore the query (affected rows => 0), and insert query is executed, even if the record exist in table.

For example, if I refresh twice the page the following query will be the same (if I refresh the page in the same SECOND), so affected rows = 0, thus INSERT INTO query is executed and an error will occur (duplicate entry cos’ of the primary index)

INSERT INTO forum_sessions (session_id, session_user_id, session_start, session_time, session_ip, session_page, session_logged_in, session_browser, session_mode) VALUES (‘a321587721fdb3a86a49d761fb4b1a2e’, ”, ‘1216230846’, ‘1216230846’, ‘127.0.0.2’, ‘main’, ‘0’, ‘Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0’, ‘html’)

Of course errors can be masked for users..

Can you provide me some tips for benchmarking? The generation time of the pages is almost the same in all 3 situations (I’ve tried all versions on my public site too.. ). localhost, I just refresh the page multiple times & read the smallest generation time.

Maybe I don’t feel any difference because the tables are small (4,870 rows, 1.4MB & 20-60 rows, < 100KB)

Peter Zaitsev

Author

15 years ago

Sergio,

This is why it is always mentioned in cases count=count+1 and similar – in this case you always change the data. Really MySQL returns both rows matched and rows updated and it is rows matched you need to wants.

Generally Insert on dup key update is the method which was designed to make it most optimal though the question if it always is remains open 🙂

Also note the difference in this cases comes from amount of queries you have to run mainly – run benchmarks over network to see the difference better.

pravin

15 years ago

I have a query in which I’m checking whether LAN(my unique key) is duplicate or not,if it duplicate then update the record else insert that LAN No.

I did it in the following way.

insert into romappingfinal(LAN,romappingfinalid,branch_code,
RO_CODE,main_ro_code,loan_end_date,ro_status,main_ro_status,
ro_transfer_status,main_ro_transfer_status,ro_name,main_ro_name)
select LAN,romappingstagingid as romappingfinalid,branch_code,
RO_CODE,main_ro_code,loan_end_date,ro_status,main_ro_status,
ro_transfer_status,main_ro_transfer_status,ro_name,main_ro_name
from viewromappingfinal
on duplicate key update LAN=values(LAN)

it’s working fine if i won’t use romappingfinalid(table primary key).

Suggest me how to use the INSERT ON UPDATE KEY using a table primary key but checking the condition on another primary key(LAN).

zhilly

15 years ago

i sinc i add duplicate key the insert not work
that not work now
someone know my little erreur?

//create and execute the query
$sql = “INSERT INTO yah_unlimited (data_key,data_value,agent_id,data_group,access_time) VALUES (‘$key’, ‘$value’, ‘$group’, ‘$owner_id’, NOW()) ON DUPLICATE KEY UPDATE access_time = NOW()”;
$result = mysql_query($sql) or die(mysql_error());
echo ‘Store successful.’;

ZdenÄ›k VeÄeÅ™a

15 years ago

28. Sergio:

Efficiency of version 1 and 2:
http://blog.zdenekvecera.cz/item/mysql-pokud-zaznam-neexistuje-vytvor-novy-jinak-puvodni-aktualizuj

Jesse

14 years ago

Hello,

I have a dilema with INSERT (fld1, fld2, fld3) VALUES(‘1′,’1a’,’1b’),(‘2′,’2a’,’2b’),(‘3′,’3a’,’3b’) ON DUPLICATE KEY UPDATE fld2 = ??? ( I want this to be the value that would be inserted ie ‘2a’) (fld1 is a unique index)

What should go in the ??? spot?

Thanks.

Baron Schwartz

14 years ago

VALUES(fld2) should go there.

Artis

14 years ago

If I am collecting banner display statistics for each day, then I have table ad_stat with fields:
id, banner_id, date, views
with (banner_id, date) as unique

I use insert into ad_stat (banner_id, date, views) values ($banner_id, ‘$date’, 1) ON DUPLICATE KEY UPDATE views=views+1

So this query firstly tries to insert and only then to Update – is there opposite way like:
UPDATE … ON NO ROWS INSERT ???
Because there are 5 inserts daily and 50 000 updates – wouldn’t it somehow increase performance?

kn33ch41

13 years ago

A note for those of you who may have problems updating a row that does in fact exist in the table, and so are always seeing a new row inserted:

In order for this to work, the PRIMARY KEY defined for the table should only contain one column–or if it has multiple columns–the data you’re going to insert/update for the given columns should be identical to the columns defined in the PRIMARY KEY.

To illustrate:

PRIMARY_KEY(id, timestamp) == a single key (but with two values, thus more unique)

The id is very likely always going to be unique. However, adding the timestamp to the primary key makes it even more unique; so, if the id exists in the table and you simply want to update the timestamp, the new value for the timestamp you’re providing will mean that id matches but timestamp does not, because you’ve defined the primary key to be id and timestamp, which means BOTH values you’re inserting/updating must equal the values currently in the database to make the primary key match and therefore update the row instead of insert a new row with the same id, but different timestamp.

Jayapal Chandran

13 years ago

I HAVE MISSED THIS FOR TWO YEARS. OH WHAT A S… . … WHAT EVER.
and now… just now found it while searching thinking why not there be an option like this… and here i am…

how to do like this

insert into (field1, field2, field3) values (1,2,3),(1,2,3) and so including on duplicate update. because i want insert/update many records from am json source.
This should have been posted in a forum but since it deals my problem exactly i thought of posting it here.

Brian May

13 years ago

well, i definetly using INSERT ON DUPLICATE KEY UPDATE rather than REPLACE INTO
it’s for performance and code reading

Dudeson

13 years ago

Beautiful function. I was on the verge of using a conditional update statement until i discovered this. Anyone knows if there is a mysql function that will shift all records key id down? by adding 1 when I add a new key, say 0 on the TOP of the table?

Scott

13 years ago

With ON DUPLICATE KEY UPDATE is there some sort of temp table involved with the old and new values?
I need access to the old values on update, but only want to hit the database once.
It somewhat works with CASE WHEN statements to catch high and low values by day.

field1= CASE WHEN field1>’$field1′ THEN ‘$field1′ ELSE field1 // this gets a low value
field2= CASE WHEN field2’$field1’ THEN ‘$field3′ ELSE field3 // this does not work off the existing (old) field1 value!

This should update field3 when field1 changes. But it does not.

field3= CASE WHEN field1=’$field1’ THEN ‘$field3’ ELSE field3 // if field3 is time you will only see the most recent value match of field1

This does work but it looks to see if field1 is equal to the new value.
The problem here is that I can only have the last match, I would like to have the first match in some cases.

It seems that old and new values are only available within a given field. Is that so?

Is there another way to have old/existing values available from one field in the case switch or value of another field.

Thanks.

Scott

13 years ago

The field2 line above is a mess… it should be

field2= CASE WHEN field2â€™$field1â€² THEN â€˜$field3â€² ELSE field3 // this does not work off the existing (old) field1 value!

pimsainnum

17 years ago

If I had tableA field(Id, timestamp, day, value)

Insert or Update where field (day) not same.

How to do??

Steve Jackson

12 years ago

Given the table.

CREATE TABLE test (
id int(11) NOT NULL,
md5 char(32) DEFAULT NULL,
PRIMARY KEY (id),
UNIQUE KEY md5 (md5)
) ENGINE=InnoDB

INSERT IGNORE INTO test (md5) values (‘A’)

1 row affected

INSERT IGNORE INTO test (md5) values (‘A’)

Duplicate entry ‘A’ for key ‘md5’

I think an incredibly useful feature here would be to return the innodb row id of the row which generated the duplicate key error. This would save us a select statement (either before or after the insert) to determine which row caused the dupe error. (I guess the unique row_id would not be of much help, so only return the value of PRIMARY if it is defined)

Is there something existing like this already? I dumped the mysqli connection object in php, and I see nothing like that. Insert_id of course, but that can only be used when there is an auto_increment column.

Someone will answer (why not just use on duplicate key update?) Well this is because i want “on duplicate key return unique_row_id”

Cheers

//Steve

Gleb Deykalo

12 years ago

What can you say about http://bugs.mysql.com/bug.php?id=52020

Can you please explain how really locking in INSERT … ON DUPLICATE KEY UPDATE works.

As I recall:

Lock 1) Transaction 1 locks row for INSERT
Lock 2) Transaction 2 tries to locks row for UPDATE (goes into queue)
Lock 3) Transaction 1 tries to locks row for UPDATE (goes into queue)

It is reasonable not to release Lock 1, but why MySQL tries to create Lock 3? Why can’t it use Lock 1 instead of Lock 3 for UPDATE operation?

Or maybe I am wrong at all?

Andrea

11 years ago

Hi guys, I’ve have a doubt: if I’m right the UPDATE statement implied a SELECT statement before the datas modification. So, is the primary key lookup in INSERT INTO…ON DUPLICATE KEY UPDATE faster than the SELECT statement implied in the UPDATE?

e.g. UPDATE table SET value=newValue WHERE primary=primaryValue;

INSERT INTO table VALUES primaryValue ON DUPLICATE KEY UPDATE value=newValue

Rob

11 years ago

Just a note on the returned rows affected count:
above somebody mentioned if it does the update it returns “2 rows affected,” if only does the insert it returns 1.

Methinks intentional (coz still like that in v5.5) and a damn good idea: and here is what I think way.

Imagine it returns 1 row affected in either case:
how long would it be before someone asks: ‘how do I know if it inserted or updated?”
– hence returning 1 or 2 they have the answer
— (either way it’s > 0 so if using that as a test it’s fine), and for those using a == 1 test: back to programming school

(would you prefer they returned “0,1 string” or 0 then 1, or introduced a new second value like @@ROWCOUNT?
– imagine how much code that would break.)

“2 row(s) affected” not a bug, it’s a damn good idea and I hope that never changes.

Vladimir

9 years ago

peter, what do you say about the performance of this method?

CREATE TABLE ipstat( ip INT UNSIGNED NOT NULL PRIMARY KEY, hits INT UNSIGNED NOT NULL UNIQUE AUTO_INCREMENT, last_hit TIMESTAMP );

REPLACE ipstat (ip, last_hit) VALUES (INET_ATON('192.168.0.1'),NOW())

Vladimir

9 years ago

oh, sorry. It work for my single counter

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL INSERT ON DUPLICATE KEY UPDATE and counters

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

MySQL INSERT ON DUPLICATE KEY UPDATE and counters

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation