MySQL INSERT ON DUPLICATE KEY UPDATE is very powerful but often forgotten MySQL feature. It was introduced in MySQL 4.1 but I still constantly see people unaware of it.

Myself I like this feature big deal because it is designed in truly MySQL style – very efficient solution for frequent task while keeping it beautiful and easy to use.

So what is the MySQL INSERT … ON DUPLICATE KEY UPDATE great for? Well any kind of maintaining counters. If you’re writing traffic accounting it could be traffic and number of packet passed for given port or IP address. For Web Applications it could be counting number of visits per page or IP address, number of times particular keyword was searched etc.

This functionality also makes it very easy to do incremental single pass log file processing and building summary tables.

Here is example:

This example actually shows one more neat feature of MySQL – inet_aton and inet_ntoa functions which can convert IP address strings to integers and back. This allows to save on field length significantly by using 4 bytes instead of 15

The third feature this example takes advantage of is TIMESTAMP field. By default first TIMESTAMP column will have its value automatically updated to current timestamp on insert and update. We actually could have omitted now() in insert clause but this would require to specify list of columns which we skipped for sake of example.

So how would this example work ? Well just you would expect it. If there is no such IP address in the table it will be added with hits=1 if it is already where (note ip is PRIMARY KEY) it would be just incremented and last visit timestamp updated.

The benefit of using MySQL INSERT … ON DUPLICATE KEY UPDATE instead of INSERT + UPDATE could be different, depending on number of new rows and data set size. 30% speedup should be typical. Performance increase is not the only benefit – what is even more important the application code becomes simplier – less error prone and easy to read.

41 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
sunxiunan

MySQL is a complete DataBase Server now. But so many people only think it as simple as before.
Such as it didn’t have transaction process, or don’t support many features.
I think it is important that study the release notes about it to know how we can use it effectively.

Paul Wilson

I’ve done a lot of work with traffic databases, and others with IP addresses and one thing appeared very quickly – if possible store the IP address both as an unsigned int as as a char(15). The conversion functions aren’t fast iterated over millions of rows.

Apachez

But in the case of “IP like ‘192.168.%'” you could use “IP BETWEEN 3232235520 AND 3232301055” if you have stored ips as unsigned int which also should be faster I guess to process I guess…

Erick

ON DUPLICATE KEY is the same as an UPDATE command. It is merely a simpler (and confusing) way of doing things. In a typical MySQL way, when this does get invoked, it returns “2 Affected Rows”. Nice. How it is a really effective feature is lost on me —

OLD WAY:
1. Insert
2. If error, then UPDATE.

NEW WAY:
1. Insert
2. Engine discovers if it already exists (through internal error), so UPDATE

No different.

As for inet ip address conversion functions, it is still much more recommended to use the functionality provided by your programming language. In the case of PHP for instance, ip2long function will do this for you: http://php.net/ip2long – you can do a simple loop microtime test to see some STARK and REAL difference in performance when you get to medium sized data.

CJ

SICK! This should save me some sql calls.

Marki

Peter: It is not a bug, it is a feature (documented one).
When row is inserted, report it as 1 affected row. When it is updated, report it as 2 affected rows.

Peter Romianowski

A short note on performance: If you use this feature in order to aggregate reports or something like that (i.e. you have a low insert but high update rate) a strategie like:

try {
UPDATE
} catch (exception) {
INSERT INTO
}

might be better. I notices a speed up of up to 15% using this strategie instead of ON DUPLICATE KEY UPDATE. The higher your actual insert rate becomes the lower is the benefit until it performs worse. As always: Test your application 😉

Lars Strojny

The problem with inet_aton and inet_ntoa is, that IPv6 is just not supported. So you will end up implementing inet_aton6 by hand (which is tricky) or using varchars again.

Eraser

Hey,
thanks for this post! I searched for a function like this!

I love MySQL! Very powerfull!

Regards,
E

foo

also useful in combination with unique keys 🙂
see http://textsnippets.com/posts/show/975

TM

Is there any way i can use a query similar to this but have some kind of where clause at the end? say if i wanted to do a whole bunch of inserts but update them if they exist already and update the primary key also? so for example
insert into tmmember values (‘STRES17N’,’SK’,’1725′,’ESTRES17′,’N’,’6′,’255′,’-1′)
on duplicate key update memb_memberid = ‘STRES17N’, memb_teamid = ‘SK’ only the original memb_member id might be ‘SKRES17N’ so as you can see id like to be able to say where memb_memberid = ‘SKRES17N’ for the cases when its true.

dreamluverz

I’m trying to use this on INSERT SELECT statement and im getting an error. my sql is something like this

INSERT INTO table SELECT * FROM table WHERE id=’2′ ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID() + 1
I use LAST_INSERT_ID() coz im using auto increment.

Sergio

I’ve tried this but I didn’t notice any performance improvement.

The UPDATE rate is higher than the INSERT rate.

So.. peter, from your experience, what’s the FASTEST WAY from the following:

1) This method I’ve been using until found out about this article. First, we check if there is a record.. then.. see there.

mysql_query(“SELECT COUNT(*) as count FROM table WHERE user_id = ‘1’”);
$cnt = mysql_fetch_array();
$cnt = $cnt[‘count’];

if ($cnt == 0) {
mysql_query(“INSERT INTO table VALUES (‘1’, ‘1’)”);
}
else {
mysql_query(“UPDATE table SET user_hits = user_hits+1 WHERE user_id = ‘1’”);
}

2) With inspiration from Peter Romianowski’s post. Haven’t tested, but I guess I’ve wrote the code ok.

mysql_query(“UPDATE table SET user_hits = user_hits+1 WHERE user_id = ‘1’”);
$affected_rows = mysql_affected_rows();

if ($affected_rows == 0) mysql_query(“INSERT INTO table VALUES (‘1’, ‘1’)”);

3) Using INSERT .. ON duplicate KEY UPDATE ..

mysql_query(“INSERT INTO table VALUES (‘1’, ‘1’) ON duplicate KEY UPDATE user_hits = user_hits+1 “);

So.. which one is recommended? I repeat.. my update rate is higher than the insert rate. I guess method 2. But I’m waiting other opinions.
Thanks.

Peter Romianowski

Sergio, I would recommend version 2 for high update rates. But I would also recommend to benchmark your application in high concurrency environment. 😉

Cheers

Peter

Sergio

Thanks for reply Peter.

However, version 2 has a small issue.
If you try to update a record with the same data as in table, mysql ignore the query (affected rows => 0), and insert query is executed, even if the record exist in table.

For example, if I refresh twice the page the following query will be the same (if I refresh the page in the same SECOND), so affected rows = 0, thus INSERT INTO query is executed and an error will occur (duplicate entry cos’ of the primary index)

INSERT INTO forum_sessions (session_id, session_user_id, session_start, session_time, session_ip, session_page, session_logged_in, session_browser, session_mode) VALUES (‘a321587721fdb3a86a49d761fb4b1a2e’, ”, ‘1216230846’, ‘1216230846’, ‘127.0.0.2’, ‘main’, ‘0’, ‘Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0’, ‘html’)

Of course errors can be masked for users..

Can you provide me some tips for benchmarking? The generation time of the pages is almost the same in all 3 situations (I’ve tried all versions on my public site too.. ). localhost, I just refresh the page multiple times & read the smallest generation time.

Maybe I don’t feel any difference because the tables are small (4,870 rows, 1.4MB & 20-60 rows, < 100KB)

pravin

I have a query in which I’m checking whether LAN(my unique key) is duplicate or not,if it duplicate then update the record else insert that LAN No.

I did it in the following way.

insert into romappingfinal(LAN,romappingfinalid,branch_code,
RO_CODE,main_ro_code,loan_end_date,ro_status,main_ro_status,
ro_transfer_status,main_ro_transfer_status,ro_name,main_ro_name)
select LAN,romappingstagingid as romappingfinalid,branch_code,
RO_CODE,main_ro_code,loan_end_date,ro_status,main_ro_status,
ro_transfer_status,main_ro_transfer_status,ro_name,main_ro_name
from viewromappingfinal
on duplicate key update LAN=values(LAN)

it’s working fine if i won’t use romappingfinalid(table primary key).

Suggest me how to use the INSERT ON UPDATE KEY using a table primary key but checking the condition on another primary key(LAN).

zhilly

i sinc i add duplicate key the insert not work
that not work now
someone know my little erreur?

//create and execute the query
$sql = “INSERT INTO yah_unlimited (data_key,data_value,agent_id,data_group,access_time) VALUES (‘$key’, ‘$value’, ‘$group’, ‘$owner_id’, NOW()) ON DUPLICATE KEY UPDATE access_time = NOW()”;
$result = mysql_query($sql) or die(mysql_error());
echo ‘Store successful.’;

Jesse

Hello,

I have a dilema with INSERT (fld1, fld2, fld3) VALUES(‘1′,’1a’,’1b’),(‘2′,’2a’,’2b’),(‘3′,’3a’,’3b’) ON DUPLICATE KEY UPDATE fld2 = ??? ( I want this to be the value that would be inserted ie ‘2a’) (fld1 is a unique index)

What should go in the ??? spot?

Thanks.

Baron Schwartz

VALUES(fld2) should go there.

Artis

If I am collecting banner display statistics for each day, then I have table ad_stat with fields:
id, banner_id, date, views
with (banner_id, date) as unique

I use insert into ad_stat (banner_id, date, views) values ($banner_id, ‘$date’, 1) ON DUPLICATE KEY UPDATE views=views+1

So this query firstly tries to insert and only then to Update – is there opposite way like:
UPDATE … ON NO ROWS INSERT ???
Because there are 5 inserts daily and 50 000 updates – wouldn’t it somehow increase performance?

kn33ch41

A note for those of you who may have problems updating a row that does in fact exist in the table, and so are always seeing a new row inserted:

In order for this to work, the PRIMARY KEY defined for the table should only contain one column–or if it has multiple columns–the data you’re going to insert/update for the given columns should be identical to the columns defined in the PRIMARY KEY.

To illustrate:

PRIMARY_KEY(id, timestamp) == a single key (but with two values, thus more unique)

The id is very likely always going to be unique. However, adding the timestamp to the primary key makes it even more unique; so, if the id exists in the table and you simply want to update the timestamp, the new value for the timestamp you’re providing will mean that id matches but timestamp does not, because you’ve defined the primary key to be id and timestamp, which means BOTH values you’re inserting/updating must equal the values currently in the database to make the primary key match and therefore update the row instead of insert a new row with the same id, but different timestamp.

Jayapal Chandran

I HAVE MISSED THIS FOR TWO YEARS. OH WHAT A S… . … WHAT EVER.
and now… just now found it while searching thinking why not there be an option like this… and here i am…

how to do like this

insert into (field1, field2, field3) values (1,2,3),(1,2,3) and so including on duplicate update. because i want insert/update many records from am json source.
This should have been posted in a forum but since it deals my problem exactly i thought of posting it here.

Brian May

well, i definetly using INSERT ON DUPLICATE KEY UPDATE rather than REPLACE INTO
it’s for performance and code reading

Dudeson

Beautiful function. I was on the verge of using a conditional update statement until i discovered this. Anyone knows if there is a mysql function that will shift all records key id down? by adding 1 when I add a new key, say 0 on the TOP of the table?

Scott

With ON DUPLICATE KEY UPDATE is there some sort of temp table involved with the old and new values?
I need access to the old values on update, but only want to hit the database once.
It somewhat works with CASE WHEN statements to catch high and low values by day.

field1= CASE WHEN field1>’$field1′ THEN ‘$field1′ ELSE field1 // this gets a low value
field2= CASE WHEN field2’$field1’ THEN ‘$field3′ ELSE field3 // this does not work off the existing (old) field1 value!

This should update field3 when field1 changes. But it does not.

field3= CASE WHEN field1=’$field1’ THEN ‘$field3’ ELSE field3 // if field3 is time you will only see the most recent value match of field1

This does work but it looks to see if field1 is equal to the new value.
The problem here is that I can only have the last match, I would like to have the first match in some cases.

It seems that old and new values are only available within a given field. Is that so?

Is there another way to have old/existing values available from one field in the case switch or value of another field.

Thanks.

Scott

The field2 line above is a mess… it should be

field2= CASE WHEN field2’$field1′ THEN ‘$field3′ ELSE field3 // this does not work off the existing (old) field1 value!

pimsainnum

If I had tableA field(Id, timestamp, day, value)

Insert or Update where field (day) not same.

How to do??

Steve Jackson

Given the table.

CREATE TABLE test (
id int(11) NOT NULL,
md5 char(32) DEFAULT NULL,
PRIMARY KEY (id),
UNIQUE KEY md5 (md5)
) ENGINE=InnoDB

INSERT IGNORE INTO test (md5) values (‘A’)

1 row affected

INSERT IGNORE INTO test (md5) values (‘A’)

Duplicate entry ‘A’ for key ‘md5’

I think an incredibly useful feature here would be to return the innodb row id of the row which generated the duplicate key error. This would save us a select statement (either before or after the insert) to determine which row caused the dupe error. (I guess the unique row_id would not be of much help, so only return the value of PRIMARY if it is defined)

Is there something existing like this already? I dumped the mysqli connection object in php, and I see nothing like that. Insert_id of course, but that can only be used when there is an auto_increment column.

Someone will answer (why not just use on duplicate key update?) Well this is because i want “on duplicate key return unique_row_id”

Cheers

//Steve

Gleb Deykalo

What can you say about http://bugs.mysql.com/bug.php?id=52020

Can you please explain how really locking in INSERT … ON DUPLICATE KEY UPDATE works.

As I recall:

Lock 1) Transaction 1 locks row for INSERT
Lock 2) Transaction 2 tries to locks row for UPDATE (goes into queue)
Lock 3) Transaction 1 tries to locks row for UPDATE (goes into queue)

It is reasonable not to release Lock 1, but why MySQL tries to create Lock 3? Why can’t it use Lock 1 instead of Lock 3 for UPDATE operation?

Or maybe I am wrong at all?

Andrea

Hi guys, I’ve have a doubt: if I’m right the UPDATE statement implied a SELECT statement before the datas modification. So, is the primary key lookup in INSERT INTO…ON DUPLICATE KEY UPDATE faster than the SELECT statement implied in the UPDATE?

e.g. UPDATE table SET value=newValue WHERE primary=primaryValue;

INSERT INTO table VALUES primaryValue ON DUPLICATE KEY UPDATE value=newValue

Rob

Just a note on the returned rows affected count:
above somebody mentioned if it does the update it returns “2 rows affected,” if only does the insert it returns 1.

Methinks intentional (coz still like that in v5.5) and a damn good idea: and here is what I think way.

Imagine it returns 1 row affected in either case:
how long would it be before someone asks: ‘how do I know if it inserted or updated?”
– hence returning 1 or 2 they have the answer
— (either way it’s > 0 so if using that as a test it’s fine), and for those using a == 1 test: back to programming school

(would you prefer they returned “0,1 string” or 0 then 1, or introduced a new second value like @@ROWCOUNT?
– imagine how much code that would break.)

“2 row(s) affected” not a bug, it’s a damn good idea and I hope that never changes.

Vladimir

peter, what do you say about the performance of this method?


CREATE TABLE ipstat(
ip INT UNSIGNED NOT NULL PRIMARY KEY,
hits INT UNSIGNED NOT NULL UNIQUE AUTO_INCREMENT,
last_hit TIMESTAMP
);

REPLACE ipstat (ip, last_hit) VALUES (INET_ATON('192.168.0.1'),NOW())

Vladimir

oh, sorry. It work for my single counter