July 31, 2014

Long PRIMARY KEY for Innodb tables

I’ve written and spoke a lot about using short PRIMARY KEYs with Innodb tables due to the fact all other key will refer to the rows by primary key. I also recommended to use sequential primary keys so you do not end up having random primary key BTREE updates which can be very expensive.

Today I would like to share practical example when you may use long primary key with value distribution far from sequential.

For one of the projects I had the task to store meta data about thumbnails for images retrieved from various web sites in the Internet. Thumbnails themselves were stored on the disk of course but meta data such as original image sizes, file size as well as thumbnail location on the disk were stored in the database:

Why did I use this solution compared to others:

Innodb Tables – This table is getting much more reads than writes so transactional overhead of for writes is small price to pay for number of benefits – caching data and index in memory -so cached lookups are very fast, clustering by primary key – so for disk bound lookups date is retrieved with 1 IO not with 2 IOs as with non-clustered tables. The other benefit – it is typical to show multiple thumbnails from the same album/domain, and due to clustering there is high probability it will all come from one or few close pages, saving IO dramatically. No recovery worries – checking/repairing large MyISAM tables in case of MySQL/System crash is painful and great to be avoided.

Long primary key – why did not I use auto_increment id in this case ? I could but this would kill clustering benefits described above. Also as this is only index I have I do not have any overhead as this would come only if extra indexes are defined. The primary key itself does not get much larger whatever columns you place into it, as primary key BTREE contains all table data in leafs anyway. Top level pages can get a bit larger for long primary keys but usually it is not major.

Non Sequential primary key This just comes as an effect of choosing url as the key. It surely makes inserts slower as they are random and page splits have to happen. Full table scan for this table would also be slow as it is quite fragmented. Table however is not getting any table scans just single row lookups by primary key. Also in practice value distribution is not as bad in this case as if I would decide to use md5(url) or something similar as a key – it is often many images from one domain/album are inserted at about same time which makes access locality much better.

What thumb_height and thumb_width are doing in index ?
In this case we could have multiple thumbnail sizes for the same image which is why they were added. We also wanted to keep system flexible so if we want add more thumbnail sizes we would not need to change anything. Why are they going after the url rather than before it in the index ? They are not selective plus we might want to get all thumbnails for given image url effectively.

In general I use this example as illustration no recommendations are good for every case and you need to check what is important in your case rather than accepting general recommendations without giving much thought on which assumptions they are based.

P.S Yeah in theory URLs can be longer than 255 characters but it was not important in this case.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. rangi says:

    in what purpose we are using primary key?

  2. Matt says:

    I have an InnoDB table with a long varchar primary key, and when I do a “select id…” query, the results do not return ordered by id. I can think of 2 possible reasons:

    1) this table has previously been converted to MyISAM and back to InnoDB, but has been modified much since then

    2) the primary keys all start with the same 18 latin1 characters, and maybe the primary key is only indexed to 10 characters? If this is the case, what does the internal data storage structure look like?

    The problem is that the table is about 3.6 million rows, and I do range selects that pull about 300,000 ids at a time. If they’re not in order, I’m not sure I’ll get the correct id’s, and I can’t add an “order by id” clause because that will hurt performance too much.

    Further, I can’t find any way to assert an index’s prefix length on MySQL 5.1. It does not display in a “show create table” statement nor a “show index from” nor a “describe table” statement!

    Here is example output without and with an “order by id” clause:

    —————————————

    select id from view.MediumTextView limit 10

    modelQuadTokens_1_030233102111321211233320001_23260356_RealtyTrac
    modelQuadTokens_1_021333011030113102211010101_23264578_RealtyTrac
    modelQuadTokens_1_023101030110202002021310021_23265168_RealtyTrac
    modelQuadTokens_1_023012311310002011100033022_23258199_RealtyTrac
    modelQuadTokens_1_023012311312130313023213020_23265583_RealtyTrac
    modelQuadTokens_1_023012311103021221232131110_23258157_RealtyTrac
    modelQuadTokens_1_023013210320212220101012001_23256624_RealtyTrac
    modelQuadTokens_1_023012130130003202121231312_23256294_RealtyTrac
    modelQuadTokens_1_023012130102300210002011120_23256413_RealtyTrac
    modelQuadTokens_1_023010203333201133300002301_23266647_RealtyTrac

    select id from view.MediumTextView order by id limit 10

    modelQuadTokens_1_002231332112003321210202031_1279355_LandWatch
    modelQuadTokens_1_002231332112003321210202031_253830_LandWatch
    modelQuadTokens_1_002302203000032102223033220_GZ14768_Worldspan
    modelQuadTokens_1_003222003112121300202000120_WV0932_Worldspan
    modelQuadTokens_1_020010301121310313312101300_2266695_Point2
    modelQuadTokens_1_020010301221110133133221233_1733075_Point2
    modelQuadTokens_1_020010301221110133133221233_1809187_Point2
    modelQuadTokens_1_020010301221110133133221233_1863296_Point2
    modelQuadTokens_1_020010301221110133133223011_1477083_Point2
    modelQuadTokens_1_020010320312133200303333333_1698538_Point2
    ————————-

    I have some workarounds, but It’s strange that I can’t find many others having a similar problem.

    Have you ever encountered this?

    Matt

  3. Matt says:

    I think we figured it out. There’s a secondary index on the table (updatedTimestamp bigint), and the results are ordered by it. Strange that the query optimizer chooses it.

    These methods fix it, but I’m not entirely sure of the repercussions:

    1) select id from view.MediumTextView order by id limit 10
    2) select id from view.MediumTextView force index (primary) limit 10
    3) select id from view.MediumTextView where id > ”

    Perhaps the optimizer is finding the most compact storage location of primary keys? If it uses the clustered index, it has to load all the leaf data pages (avg ~1kb in this table), and parse out the ids. But if it uses the secondary index, it’s much fewer data pages to parse through.

    Is there a good method for creating a secondary index that’s optimized for selecting large ranges of ids? Theoretically, you could make a zero-field b-tree index, which would then just be a huge list of primary keys, but i’m not sure you can do that. What about an index on a bit field where all rows have the same bit value?

    It would be interesting to see your thoughts on low cardinality indexes and short-prefix indexes resulting in low cardinality. What happens if I have a 5 million row table with an index on a true/false column? Does it form two large trees of primary keys? Or is it some inefficient list structure.

    I guess i better stop now… thanks

  4. Jaka Jancar says:

    Peter,

    you say that all other keys will refer to the rows by primary key.

    What about if the table doesn’t have a primary key, e.g. a table with columns:
    – foo NOT NULL
    – bar1 NULL
    – bar2 NULL
    UNIQUE (foo, bar1)
    UNIQUE (foo, bar2)
    ?

  5. peter says:

    Jaka,

    In this case (because bar1 can be NULL) innodb will create its internal primary key which is not exposed on the top level. That primary key will be when used for reference from unique keys.
    If bar1 would be NOT NULL the first unique key would be in fact primary key.

  6. Jaka Jancar says:

    Thanks Peter!

  7. Stefan says:

    What if you really really want to store a proper size URL?

  8. Richard says:

    Hi Peter,

    I like the MD5 of the URL suggestion.

    Would you not get the same clustering/insert benefit from using 2 hashes equalling 32 bytes an using that as the PK?

    Hash of the URL domain + Hash of the URL path (or full URL)

    That way you would have a fixed length PK that would still have the cardinality in the first 16 bytes that you may be looking for.

  9. angelo mandato says:

    I have looked everywhere for what is meant by a “long” primary key. Being a developer, a long is a large integer. So a real explanation what is meant by a “long” primary key would be great. My situation I have 3 columns of type unsigned int that make up my primary key, and logically it makes sense to me as they are never null but if that is considered too “long” then I want to change it.

Speak Your Mind

*