January 18, 2007

INSERT ON DUPLICATE KEY UPDATE and REPLACE INTO

Posted by peter |

Jonathan Haddad writes about REPLACE INTO and INSERT ON DUPLICATE KEY UPDATE. Really, Why MySQL has both of these, especially both are non ANSI SQL extensions ?

The story here seems to be the following – REPLACE INTO existed forever, at least since MySQL 3.22 and was a way to do replace faster and what is also important atomically, remember at that time MySQL only had ISAM tables with table locks and no transactions support. Of course you could use LOCK TABLES but it is not efficient.

The reason REPLACE could be efficient for ISAM and MyISAM, especially for fixed length rows is – it could perform row replacement without reading old data first, and of course because you could set it to replace multiple values at the same time just as you have multiple value INSERT.

As a side note: the fact REPLACE does not have to do read before write is a bit overrated from efficiency standpoint. As most rows are less than 4K-8K in size and are not aligned to OS cache page OS still would need to perform read from hard drive before it can perform an update, if data is not in OS cache, and if it is read would not be large overhead ether.

The problem with REPLACE was – many people tried to use it like update accessing previous column value, for example doing something like REPLACE INTO tbl (col1) VALUES (col1+10);. Instead of acting as update this really will insert NULL value, as col1 is undefined at this stage.

INSERT ON DUPLICATE KEY UPDATE is newer feature which came in MySQL 4.1 by advice one of MySQL big users. The question was efficiently maintaining counters in MySQL. There are also number of similar cases when you want ether to insert the new row or update stats for existing row.

I think both features are really great and I use them both. They are implemented really in MySQL style of being simple powerful and easy to use.

Related posts: :INSERT ON DUPLICATE KEY UPDATE and summary counters.::Innodb performance gotcha w Larger queries.::Duplicate indexes and redundant indexes:
 

10 Comments »

  1. Thanks for the clarification and history. Using INNODB, are there any performance advantages to either?

    Comment :: January 18, 2007 @ 12:59 pm

  2. 2. peter

    Sure,

    I do not have benchmarks with Innodb but I’d expect them to be close. I’d mostly look at what makes more sense for your application from development standpoint.

    Comment :: January 18, 2007 @ 1:11 pm

  3. 3. Jerry Zheng

    The two statements have different impact for INNODb table.
    The REPLACE INTO acts as DELETE/INSERT for duplicates.
    The INSERT ON DUPLIACTE UPDATE is true update.

    If you have a child table defined with “on delete CASCADE”, the REPLACE INTO will delete the child record too.

    I will use INSERT ON DUPLICATE rather than the REPLACE INTO for the above reason.

    Comment :: January 22, 2007 @ 6:07 pm

  4. 4. Jerry Zheng

    Here is a test case:

    mysql> select version();
    +———————+
    | version() |
    +———————+
    | 4.1.18-standard-log |
    +———————+

    mysql> create table p(id int primary key, notid int) type=innodb;
    Query OK, 0 rows affected, 1 warning (0.07 sec)

    mysql> create table c (id int primary key, fk int , foreign key (fk) references p(id) on delete CASCADE ) type=innodb;
    Query OK, 0 rows affected, 1 warning (0.06 sec)

    mysql> insert into p values (1,1), (2,2);
    Query OK, 2 rows affected (0.00 sec)
    Records: 2 Duplicates: 0 Warnings: 0

    mysql> insert into c values (1,1), (2,2);
    Query OK, 2 rows affected (0.00 sec)
    Records: 2 Duplicates: 0 Warnings: 0

    mysql> select * from p;
    +—-+——-+
    | id | notid |
    +—-+——-+
    | 1 | 1 |
    | 2 | 2 |
    +—-+——-+
    2 rows in set (0.00 sec)

    mysql> select * from c;
    +—-+——+
    | id | fk |
    +—-+——+
    | 1 | 1 |
    | 2 | 2 |
    +—-+——+
    2 rows in set (0.00 sec)

    mysql> replace into p values (1,10);
    Query OK, 2 rows affected (0.00 sec)

    mysql> select * from c;
    +—-+——+
    | id | fk |
    +—-+——+
    | 2 | 2 |
    +—-+——+
    1 row in set (0.00 sec)

    – We lost the child #1.

    mysql> insert into p (id, notid) values (2,20) on duplicate key update notid=20;
    Query OK, 2 rows affected (0.01 sec)

    mysql> select * from c;
    +—-+——+
    | id | fk |
    +—-+——+
    | 2 | 2 |
    +—-+——+
    1 row in set (0.00 sec)

    – Child #2 is safe.

    mysql> select * from p;
    +—-+——-+
    | id | notid |
    +—-+——-+
    | 1 | 10 |
    | 2 | 20 |
    +—-+——-+
    2 rows in set (0.00 sec)

    Comment :: January 22, 2007 @ 6:16 pm

  5. 5. peter

    Thanks Jerry,

    Good catch.

    Comment :: January 22, 2007 @ 6:24 pm

  6. Also related to Jerrys comments, REPLACE will use the next available # on auto-increment fields, which may not always be desired.

    Comment :: January 31, 2007 @ 8:02 am

  7. Jerry:
    Thanks for the tip – I’ll be sure to keep that in mind. For the most part, in what I do, INSERT ON DUPLICATE UPDATE is what I need to use, especially knowing the cascading keys issue.

    Comment :: February 7, 2007 @ 12:03 pm

  8. 8. P.ANBALAGAN

    I WANT QUERY FOR TO INSERT MORE 100000 RECORDS IN A TABLE.

    Comment :: March 10, 2008 @ 12:14 am

  9. 9. G Barnes

    I have two identical tables named Table1 and Table2 on two computers respectivaly. There are two fields in each, “ITEM” and “SOLD”. In both tables, “ITEM” is the primary key. Is there a way to merge Table1 into Table2 with something similar to INSERT ON DUPLICATE UPDATE sold =sold +(the first table’s sold value)? Thus if Table1 contains 10 (in the sold field) and Table2 contains 20. I want the total in Table 2 to be 30. Is this possible?

    Comment :: April 13, 2008 @ 8:46 am

  10. 10. Jeff

    One benefit of REPLACE over UPDATE is the DELAYED feature, which comes in handy.

    Comment :: September 4, 2009 @ 1:42 pm

 

Subscribe without commenting

Trackbacks/Pingbacks