<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: To UUID or not to UUID ?</title>
	<atom:link href="http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/</link>
	<description>Everything about MySQL Performance</description>
	<pubDate>Tue, 02 Dec 2008 12:37:57 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: xli</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-366358</link>
		<dc:creator>xli</dc:creator>
		<pubDate>Mon, 27 Oct 2008 16:46:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-366358</guid>
		<description>Hi, Peter

I noticed your benchmark which shows 200 times performance differences between auto-increment and UUID(). I'm wondering if you did the same thing for InnoDB as well. We are using MySQL 5.0 and InnoDB, we got badly lock conflict when inserting rows into InnoDB tables with auto-increment column. So, we are considering to switch to UUID() as a PK. I did a very simple testing: wrote a stored procedure, which has a loop to insert a row into a table. the results for InnoDB are shown as below: inserting 100,000 rows into InnoDB tables, for a table with auto-increment, it tooks 254 sec; for a table without auto-increment but use UUID(), it took 263 sec; the same testing for MyISAM tables, I got 54 sec vs. 68 sec. the 54 sec is similar to what you got. What I did wrong?</description>
		<content:encoded><![CDATA[<p>Hi, Peter</p>
<p>I noticed your benchmark which shows 200 times performance differences between auto-increment and UUID(). I&#8217;m wondering if you did the same thing for InnoDB as well. We are using MySQL 5.0 and InnoDB, we got badly lock conflict when inserting rows into InnoDB tables with auto-increment column. So, we are considering to switch to UUID() as a PK. I did a very simple testing: wrote a stored procedure, which has a loop to insert a row into a table. the results for InnoDB are shown as below: inserting 100,000 rows into InnoDB tables, for a table with auto-increment, it tooks 254 sec; for a table without auto-increment but use UUID(), it took 263 sec; the same testing for MyISAM tables, I got 54 sec vs. 68 sec. the 54 sec is similar to what you got. What I did wrong?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: links for 2008-10-23 &#171; Object neo = neo Object</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-364628</link>
		<dc:creator>links for 2008-10-23 &#171; Object neo = neo Object</dc:creator>
		<pubDate>Fri, 24 Oct 2008 04:31:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-364628</guid>
		<description>[...] To UUID or not to UUID ? &#124; MySQL Performance Blog (tags: uuid scalability) [...]</description>
		<content:encoded><![CDATA[<p>[...] To UUID or not to UUID ? | MySQL Performance Blog (tags: uuid scalability) [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Enlaces técnicos recomendados</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-354050</link>
		<dc:creator>Enlaces técnicos recomendados</dc:creator>
		<pubDate>Wed, 10 Sep 2008 13:12:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-354050</guid>
		<description>[...] To UUID or not to UUID ? de MySQL Performance Blog [...]</description>
		<content:encoded><![CDATA[<p>[...] To UUID or not to UUID ? de MySQL Performance Blog [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Al T.</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-278544</link>
		<dc:creator>Al T.</dc:creator>
		<pubDate>Tue, 15 Apr 2008 17:32:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-278544</guid>
		<description>I have seen those problems with UUID's in MySQL when stored as text.  Ideally, MySQL would have a UUID column type to store the values as binary rather than strings but convert to string anytime the row is returned.  The ability to insert an ID in hex text and convert it to binary would be a must too.  That's the biggest problem with using binary fields: you can't read them or copy and paste them without conversion.  That change would provide better performance in terms of storage and indexing (a 16-byte column instead of 36).

In an effort to improve the situation, I created a simple UUID class capable of generating random UUID's with an option to store them in Base64 rather than hex.  Since the length of the UUID is always constant, I was able to trim off the extra = in the Base64 conversion and come up with a case-sensitive 22-byte UUID representation (VTIW7xOgReOGrL3vMRjm4Q, for example).  The performance increase was enormous, and the overhead is much smaller (22 bytes vs. 16) and you are able to convert to binary and hex at any time.

Later, as I thought more of the problem, I realized the Base64 encoding was inefficient for a 128-bit number.  In order to get maximum efficiency out of Base64, the number of bits needs to be divisible by 6.  So I created a new identifier that was only 72 bits.  (Yes, the collision probability goes up, but it is still one in 4.7 * 10^21.)  These UID's (as I call them) only take 12 bytes to store in a binary column and strike a very good balance between speed and uniqueness.  They can also be translated to GUID's (########-0000-0000-0000-00##########) and back when needed.  (If 72-bits is not enough, use 96 bits to make a 16-byte UID).</description>
		<content:encoded><![CDATA[<p>I have seen those problems with UUID&#8217;s in MySQL when stored as text.  Ideally, MySQL would have a UUID column type to store the values as binary rather than strings but convert to string anytime the row is returned.  The ability to insert an ID in hex text and convert it to binary would be a must too.  That&#8217;s the biggest problem with using binary fields: you can&#8217;t read them or copy and paste them without conversion.  That change would provide better performance in terms of storage and indexing (a 16-byte column instead of 36).</p>
<p>In an effort to improve the situation, I created a simple UUID class capable of generating random UUID&#8217;s with an option to store them in Base64 rather than hex.  Since the length of the UUID is always constant, I was able to trim off the extra = in the Base64 conversion and come up with a case-sensitive 22-byte UUID representation (VTIW7xOgReOGrL3vMRjm4Q, for example).  The performance increase was enormous, and the overhead is much smaller (22 bytes vs. 16) and you are able to convert to binary and hex at any time.</p>
<p>Later, as I thought more of the problem, I realized the Base64 encoding was inefficient for a 128-bit number.  In order to get maximum efficiency out of Base64, the number of bits needs to be divisible by 6.  So I created a new identifier that was only 72 bits.  (Yes, the collision probability goes up, but it is still one in 4.7 * 10^21.)  These UID&#8217;s (as I call them) only take 12 bytes to store in a binary column and strike a very good balance between speed and uniqueness.  They can also be translated to GUID&#8217;s (########-0000-0000-0000-00##########) and back when needed.  (If 72-bits is not enough, use 96 bits to make a 16-byte UID).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: cybermonk</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-254921</link>
		<dc:creator>cybermonk</dc:creator>
		<pubDate>Thu, 20 Mar 2008 17:54:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-254921</guid>
		<description>Well it's too bad the comparison wasn't done using the UUID in binary format, autogenerating the GUID on the client side using Jimmy Nilsson's GUID.COMB. Can you do that comparison peter? NHibnerate has an implementation of GUID.COMB.</description>
		<content:encoded><![CDATA[<p>Well it&#8217;s too bad the comparison wasn&#8217;t done using the UUID in binary format, autogenerating the GUID on the client side using Jimmy Nilsson&#8217;s GUID.COMB. Can you do that comparison peter? NHibnerate has an implementation of GUID.COMB.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anthony Mathews</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-218410</link>
		<dc:creator>Anthony Mathews</dc:creator>
		<pubDate>Sat, 15 Dec 2007 20:09:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-218410</guid>
		<description>It appears that the benchmark done in this article was storing the UUID as a ascii representation of a binary number when in fact the UUID should have bin stored in binary form.  This is equivalent to searching using a 4 byte integer for the auto_increment primary key and then using a persons First, Middle and Last name to search for the UUID implementation.

Write a function that converts the value returned by the UUID function to binary and store it in a binary(128) column.  Write yourself another function that will cast it back to a characters string with hyphenation if you need to display it.

However, the primary use for UUIDs or GUIDs is data portability, not speed for searching.  If you have worked for large companies where you have redundant data stored in many locations you have to manage this primary key much closer and have the ability to generate something that you know will be unique across the company.  Integers and auto increment will not cut it.</description>
		<content:encoded><![CDATA[<p>It appears that the benchmark done in this article was storing the UUID as a ascii representation of a binary number when in fact the UUID should have bin stored in binary form.  This is equivalent to searching using a 4 byte integer for the auto_increment primary key and then using a persons First, Middle and Last name to search for the UUID implementation.</p>
<p>Write a function that converts the value returned by the UUID function to binary and store it in a binary(128) column.  Write yourself another function that will cast it back to a characters string with hyphenation if you need to display it.</p>
<p>However, the primary use for UUIDs or GUIDs is data portability, not speed for searching.  If you have worked for large companies where you have redundant data stored in many locations you have to manage this primary key much closer and have the ability to generate something that you know will be unique across the company.  Integers and auto increment will not cut it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: scale-out: notes on sharding, unique keys, foreign keys&#8230; &#171; from Oracle to MySQL</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-158651</link>
		<dc:creator>scale-out: notes on sharding, unique keys, foreign keys&#8230; &#171; from Oracle to MySQL</dc:creator>
		<pubDate>Thu, 23 Aug 2007 19:30:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-158651</guid>
		<description>[...] UUIDs are a bit ugly for reading and working with. (See the MySQL Performance Blog entry &#8220;To UUID or not to UUID&#8221; for performance [...]</description>
		<content:encoded><![CDATA[<p>[...] UUIDs are a bit ugly for reading and working with. (See the MySQL Performance Blog entry &#8220;To UUID or not to UUID&#8221; for performance [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: peter</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-106459</link>
		<dc:creator>peter</dc:creator>
		<pubDate>Wed, 11 Apr 2007 09:34:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-106459</guid>
		<description>Jacob,

I can tell you what we do for http://www.boardreader.com with some billion rows which need quick retrieval.
The data is partitioned in "table groups" which are mapped to the servers.   We use 64bit identifiers with lower byte used to store table group. 

Search is done using "Sphinx" search engine and we basically need to find rows by IDs to show result set in most cases.</description>
		<content:encoded><![CDATA[<p>Jacob,</p>
<p>I can tell you what we do for <a href="http://www.boardreader.com" rel="nofollow">http://www.boardreader.com</a> with some billion rows which need quick retrieval.<br />
The data is partitioned in &#8220;table groups&#8221; which are mapped to the servers.   We use 64bit identifiers with lower byte used to store table group. </p>
<p>Search is done using &#8220;Sphinx&#8221; search engine and we basically need to find rows by IDs to show result set in most cases.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jacob</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-106309</link>
		<dc:creator>Jacob</dc:creator>
		<pubDate>Tue, 10 Apr 2007 21:14:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-106309</guid>
		<description>Kevin,

Do you have any good links or more information on how to go about doing "client side sharding of data." This sounds like what I need to do. I am creating a app that will need quick storage and retrieval of 8-10 Billion rows of data. Splitting up the data onto several servers with a deterministic way of finding it again is what I need.

Any help would be greatly appreciated.</description>
		<content:encoded><![CDATA[<p>Kevin,</p>
<p>Do you have any good links or more information on how to go about doing &#8220;client side sharding of data.&#8221; This sounds like what I need to do. I am creating a app that will need quick storage and retrieval of 8-10 Billion rows of data. Splitting up the data onto several servers with a deterministic way of finding it again is what I need.</p>
<p>Any help would be greatly appreciated.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sachman Bhatti</title>
		<link>http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-95934</link>
		<dc:creator>Sachman Bhatti</dc:creator>
		<pubDate>Tue, 27 Mar 2007 01:15:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2007/03/13/to-uuid-or-not-to-uuid/#comment-95934</guid>
		<description>Ok I misinterpreted that, someone else clarified it for me.  I thought there was a way to do O(1) updates....not the mapping :)</description>
		<content:encoded><![CDATA[<p>Ok I misinterpreted that, someone else clarified it for me.  I thought there was a way to do O(1) updates&#8230;.not the mapping <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
</channel>
</rss>
