July 29, 2014

Beware of MySQL Data Truncation

Here is nice gotcha which I’ve seen many times and which can cause just a minefield for many reasons.
Lets say you had a system storing articles and you use article_id as unsigned int. As the time goes and you see you may get over 4 billions of articles you change the type for article_id to bigint unsigned but forget linked tables.

For table “article” itself you would notice the issue quickly if you would forget to alter it as inserts will fail because of primary key collisions. But for linked tables like above inserts work you just get a warning:

And in my experience very few people mind to check the warnings promptly.

So two things happen, one is nastier than the other. First the comments end up associated to the wrong article (4294967295) which can be quite a problem in particular if these are private comments. The other thing which tends to happen – you get A LOT of comments associated with this article and few systems are designed with YouTube capacity to handle millions of comments per item.

Note the question in this case is not like there is just one article which has insane number of comments but a as a all articles outside of allowed range have it:

MySQL optimizer behaves rather strange in such case (at least in MySQL 5.0) – instead of instantly saying there is no such value (because value is out of range what could be stored in the column) it truncates the value, performs index lookup, scanning all rows (just 3 in my trivial example) and when filtering them out because value really does not match the where clause.

This makes such error even harder to catch – you will not see wrong comments for a lot articles, you will see 0 while queries will just run insanely slow.

So how to avoid this problem ?

It is actually quite easy – you just need to enable strict mode:

Note however strict mode should not be taken lightly – it is good for reliability but many applications will need to be fixed before they start working properly with this mode.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Onno says:

    Nice to know, and indeed, I hardly ever check the warnings…

    However, this seems like a bit of a futile problem.

    In the case comments can have the same id (i.e. comment #1 can exist on several articles) it would seem logical to me to simply create a unique key on the columns ‘article_id’ and ‘comment_id’ in the table preventing this issue.
    But more importantly; if the database were to be designed coherently, the article with ID 4300000000 would not get created in the first place, so how are people going to comment on an article that never got created in the first place? ;)

  2. Onno says:

    uh oh…. typed to quickly…

    “you change the type for article_id to bigint unsigned but forget linked tables.”

    I would never do that, of course. Please forget my second remark
    :)

  3. peter says:

    Onno,

    In consulting we do not always deal with perfectly designed systems plus this is of course simplified example

    It does not matter what comment_id is – quite common comment_id is unique by itself (so we can fetch data by comment id easily) which may reduce the need of such index. Not to mention people hunting for performance may be easy on unique indexes because (in innodb) it blocks insert buffer.

  4. Onno says:

    right you are :)

Speak Your Mind

*