<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>MySQL Performance Blog &#187; ideas</title>
	<atom:link href="http://www.mysqlperformanceblog.com/category/ideas/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mysqlperformanceblog.com</link>
	<description>Everything about MySQL Performance</description>
	<lastBuildDate>Sat, 21 Nov 2009 03:11:18 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Few more ideas for InnoDB features</title>
		<link>http://www.mysqlperformanceblog.com/2009/06/29/few-more-ideas-for-innodb-features/</link>
		<comments>http://www.mysqlperformanceblog.com/2009/06/29/few-more-ideas-for-innodb-features/#comments</comments>
		<pubDate>Tue, 30 Jun 2009 03:21:22 +0000</pubDate>
		<dc:creator>Vadim</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[patch]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=708</guid>
		<description><![CDATA[As you see MySQL is doing great in InnoDB performance improvements, so we decided to concentrate more on additional InnoDB features, which will make difference.
Beside ideas I put before http://www.mysqlperformanceblog.com/2009/03/30/my-hot-list-for-next-innodb-features/ (and one of them &#8211; moving InnoDB tables between servers are currently under development), we have few mores:
- Stick some InnoDB tables / indexes in [...]]]></description>
			<content:encoded><![CDATA[<p>As you see MySQL is doing great in InnoDB performance improvements, so we decided to concentrate more on additional InnoDB features, which will make difference.</p>
<p>Beside ideas I put before <a href="http://www.mysqlperformanceblog.com/2009/03/30/my-hot-list-for-next-innodb-features/">http://www.mysqlperformanceblog.com/2009/03/30/my-hot-list-for-next-innodb-features/</a> (and one of them &#8211; moving InnoDB tables between servers are currently under development), we have few mores:</p>
<p>- Stick some InnoDB tables / indexes in buffer pool, or set priority for InnoDB tables. That means tables with bigger priority will be have more chances to stay in buffer pool  then tables with lower priority. Link to blueprint <a href="https://blueprints.launchpad.net/percona-patches/+spec/lru-priority-patch">https://blueprints.launchpad.net/percona-patches/+spec/lru-priority-patch</a></p>
<p>- Separate LRU list into several lists, and in this way it will allow us to emulate several buffer pool, with features to keep different tables in different buffer pools and also to decrease contention on buffer pool. Link <a href="https://blueprints.launchpad.net/percona-patches/+spec/multiple-lru-patch">https://blueprints.launchpad.net/percona-patches/+spec/multiple-lru-patch</a></p>
<p>- We are looking to include <a href="https://launchpad.net/wafflegrid">Waffle Grid</a> into XtraDB releases with some additional features like caching buffer pool on SSD.</p>
<p>If ideas are interesting for you and you want to support them, <a href="http://www.percona.com/contacts.html">contact us</a></p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Vadim |
      <a href="http://www.mysqlperformanceblog.com/2009/06/29/few-more-ideas-for-innodb-features/#comments">7 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2009/06/29/few-more-ideas-for-innodb-features/&amp;title=Few more ideas for InnoDB features" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2009/06/29/few-more-ideas-for-innodb-features/&amp;title=Few more ideas for InnoDB features" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2009/06/29/few-more-ideas-for-innodb-features/&amp;title=Few more ideas for InnoDB features" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2009/06/29/few-more-ideas-for-innodb-features/&amp;T=Few more ideas for InnoDB features" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2009/06/29/few-more-ideas-for-innodb-features/&amp;title=Few more ideas for InnoDB features" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2009/06/29/few-more-ideas-for-innodb-features/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Adjusting Innodb for Memory resident workload</title>
		<link>http://www.mysqlperformanceblog.com/2009/03/25/adjusting-innodb-for-memory-resident-workload/</link>
		<comments>http://www.mysqlperformanceblog.com/2009/03/25/adjusting-innodb-for-memory-resident-workload/#comments</comments>
		<pubDate>Thu, 26 Mar 2009 03:06:40 +0000</pubDate>
		<dc:creator>peter</dc:creator>
				<category><![CDATA[ideas]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=649</guid>
		<description><![CDATA[As larger and larger amount of memory become common (512GB is something you can fit into relatively commodity server this day) many customers select to build their application so all or most of their database (frequently Innodb) fits into memory.   
If all tables fit in Innodb buffer pool the performance for reads will [...]]]></description>
			<content:encoded><![CDATA[<p>As larger and larger amount of memory become common (512GB is something you can fit into relatively commodity server this day) many customers select to build their application so all or most of their database (frequently Innodb) fits into memory.   </p>
<p>If all tables fit in Innodb buffer pool the performance for reads will be quite good however writes will still suffer because Innodb will do a lot of random IO during fuzzy checkpoint operation which often will become bottleneck.     This problem makes some customers not concerned with persistence run Innodb of ram drive</p>
<p>In fact with relatively simple changes Innodb could be made to perform much better for memory resident workloads and we should consider fixing these issues for XTRADB.  </p>
<p><strong>Preload</strong>    It is possible to preload  all innodb tables (ibdata, .ibd files) on the system start &#8211; this would avoid warmup problem and also make crash recovery fast even with very large log file &#8211; random IO is what usually limits recovery speed.   Because files can be just read sequentially the read speed can be hundreds of megabytes per second even for commodity RAIDs.</p>
<p><strong>Sequential Checkpointing</strong>  Currently fuzzy checkpoint flushes pages which have not been flushed for longest amount of time which causes random IO.   In resident checkpoint mode  we should just flush everything (yes even clean pages)  sequentially.    This should allow to get sequential writes giving us 100MB+ of write speed &#8211; which means 256GB buffer pool can be flushed about once per 30 minutes.    It should be possible to just size Innodb logs so they are not cycled through faster than flush cycle.    </p>
<p>This may not be the most optimal solution if you design the system from scratch but it is something which can be done without changing Innodb core logic significantly or storing on disk storage format at all. </p>
<p>This is just an idea at this point which I&#8217;ve discussed with some customers and we&#8217;re not working on it yet, though if you think it is something which you think would help your performance challenges we would be happy to implement it for you. </p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by peter |
      <a href="http://www.mysqlperformanceblog.com/2009/03/25/adjusting-innodb-for-memory-resident-workload/#comments">9 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2009/03/25/adjusting-innodb-for-memory-resident-workload/&amp;title=Adjusting Innodb for Memory resident workload" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2009/03/25/adjusting-innodb-for-memory-resident-workload/&amp;title=Adjusting Innodb for Memory resident workload" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2009/03/25/adjusting-innodb-for-memory-resident-workload/&amp;title=Adjusting Innodb for Memory resident workload" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2009/03/25/adjusting-innodb-for-memory-resident-workload/&amp;T=Adjusting Innodb for Memory resident workload" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2009/03/25/adjusting-innodb-for-memory-resident-workload/&amp;title=Adjusting Innodb for Memory resident workload" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2009/03/25/adjusting-innodb-for-memory-resident-workload/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>High-Performance Click Analysis with MySQL</title>
		<link>http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/</link>
		<comments>http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/#comments</comments>
		<pubDate>Tue, 23 Dec 2008 03:48:17 +0000</pubDate>
		<dc:creator>Baron Schwartz</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[optimizer]]></category>
		<category><![CDATA[replication]]></category>
		<category><![CDATA[tips]]></category>
		<category><![CDATA[xtradb]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=562</guid>
		<description><![CDATA[We have a lot of customers who do click analysis, site analytics, search engine marketing, online advertising, user behavior analysis, and many similar types of work.  The first thing these have in common is that they're generally some kind of loggable event.
The next characteristic of a lot of these systems (real or planned) is the [...]]]></description>
			<content:encoded><![CDATA[<p>We have a lot of customers who do click analysis, site analytics, search engine marketing, online advertising, user behavior analysis, and many similar types of work.  The first thing these have in common is that they're generally some kind of loggable event.</p>
<p>The next characteristic of a lot of these systems (real or planned) is the desire for "real-time" analysis.  Our customers often want their systems to provide the freshest data to their own clients, with no delays.</p>
<p>Finally, the analysis is usually multi-dimensional.  The typical user wants to be able to generate summaries and reports in many different ways on demand, often to support the functionality of the application as well as to provide reports to their clients.  Clicks by day, by customer, top ads by clicks, top ads by click-through ratio, and so on for dozens of different types of slicing and dicing.</p>
<p>And as a result, one of the most common questions we hear is how to build high-performance systems to do this work.  Let's see some ways you can build the functionality you need and get the performance you need.  Because I've built two such systems to manage online ads through Google Adwords, Yahoo, MSN and others, it's easy and familiar for me to use the example of search engine marketing.  I'll do that throughout this article.</p>
<p><strong>Requirements</strong></p>
<p>The words "need" and "want" are different.  Do you really need atomic-level data?  Do you really need real-time reporting?  If you do, the problem is much more expensive to solve.</p>
<p>Start with the granularity of your data.  What data do you need to make your business run?  If you can't get access to the time of day of every click on every ad, will it hamper your ability to measure the ad's value?  Is it enough to know how many times the ad was clicked each day?  If so, you can roll all those events up into a per-day table.</p>
<p>Next, let's look at "real-time."  None of the big three (Google, Yahoo, MSN) provides real-time reporting last time I was involved with them (and I suspect this is still true).  It's too expensive.  Consider your user expectations.  For most applications I've been involved with, having day-old data is adequate, and users don't expect realtime.  The trick here is that when you start out, realtime is possible because your data is small.  "Hey, we do realtime reporting.  Google doesn't even do that!  We're better!" Then you get popular <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   And if you've promoted your better-ness in the meantime, you might have to do some awkward backpedaling with customers, who now expect realtime data.  The database giveth, and the database taketh away.</p>
<p>Finally, you should think a lot about how you need to query the data.  It is a hard question to answer, and sometimes I've seen it evolve over time, especially as the growing data size forces it to.  This goes back to what data you really need to make your business run.  Anything else is gravy.  If there are nice-to-haves, consider not building them in.  <a href="http://blip.tv/file/1356502">Listen to some talks by 37Signals</a> if you need inspiration to toss things out.  Define the types of queries you absolutely have to have, if possible, and note the ways and types of aggregation (by-ad by-day, for example).</p>
<p>Sometimes I ask a customer "what kinds of queries do you have to run?" and they say "we can't decide, so we want to just store everything." If you can't decide yet, then don't store everything in the database.  Instead, store the source data in some fashion that you can reload later, such as flat files, and build support in the database for one or two capabilities you absolutely need now; then add the rest later, reloading the data if needed.</p>
<p><strong>Aggregate</strong></p>
<p>Aggregation is absolutely key for most people.  There are special cases, and there are ways to do general-purpose work without aggregating (see the section below on technologies), but if you're doing this with vanilla MySQL, you will need to aggregate your data.</p>
<p>What you want to do is aggregate in ways that optimize the most expensive things you'll do.  And then, you might super-aggregate too.  For example, if you aggregate by day and then you do a lot of queries over 365-day ranges for year-over-year analysis, aggregate again by month.  Then write your queries to use the most aggregated data possible to save work.</p>
<p>Avoid operations that update huge chunks of aggregated data at once.  Among other things, you'll make replication lag badly.  More about this later.</p>
<p>Another way to say "aggregate" is to say "pre-compute."  If you have time-critical queries for your app to do its work, can you do the work ahead of time so it's ready to get when needed?  This might or might not be aggregation.</p>
<p><strong>Denormalize</strong></p>
<p>Pre-computing and careful denormalization need to go together.  Figure out what other types of data you'll need in those aggregate tables, and include columns to support these queries.  But beware of denormalizing with character data; try to make your rows fixed-length.</p>
<p>One reason denormalization is important is that nested-loop joins on large data sets are very expensive.  If MySQL supported sort-merge or hash joins, you'd have other possibilities, but it doesn't, so you want to build your aggregate tables to avoid joins.</p>
<p><strong>Watch Data Types</strong></p>
<p>Does your ad ID look like "8a4dabde-1c82-102c-ab13-0019b984eacd" and is it stored in a VARCHAR(36)?  When tables get big, every byte matters a lot.  Use the smallest data types you can, the simplest character sets you can, and watch out for NULLable columns.  Use smallint unsigned or tinyint unsigned if you can.  You can save very large amounts of space.  Choose primary keys very carefully, especially with InnoDB tables -- don't use GUIDs.  Which brings me to my next point:</p>
<p><strong>Use InnoDB</strong></p>
<p>Assuming that you will use the stock MySQL server, InnoDB is usually your best bet.  (Actually, <a href="https://launchpad.net/percona-xtradb">XtraDB</a> might be very interesting for you, but I digress).  Due to the cost of repairing huge MyISAM tables and taking downtime, I would not use MyISAM for anything but read-only tables when things get big.  And even if it's read-only, there's still another reason to use InnoDB/XtraDB tables...</p>
<p><strong>Optimize For I/O</strong></p>
<p>It is pretty much inevitable: if you do this kind of data processing in MySQL, you're going to end up heavily I/O bound.  Listen to any of the talks at past MySQL conferences from people who have built systems like yours, and there's a fair chance they will talk about how hard they have to work on I/O capacity.</p>
<p>What does this have to do with InnoDB?  Data clustering. InnoDB's primary keys define the physical order rows are stored in.  That lets you choose which rows are stored close to each other, which is very beneficial in many cases.  Especially on huge tables, it lets you scan portions of a table instead of the whole table if you a) choose your aggregation to match the order of your common queries and b) choose your primary key correctly.</p>
<p>Let's go back to the ad-by-day table.  If you query date ranges most of the time, you should define the primary key as (day, ad).  Don't use an auto-increment primary key, and don't put ad first.  If you put ad first, then you're going to scan the whole table to query for information about yesterday.  If you put day first, then yesterday will all be stored physically together (within the page -- the pages themselves may be widely separated, but that's another matter).</p>
<p><strong>Don't Store Non-Aggregated Data</strong></p>
<p>I've been talking a lot about aggregated data.  What do you do with the non-aggregated data?  My answer is usually simple: just don't store it in the database.  Instead, pre-aggregate.  Suppose your data is coming from some Apache log or similar source.  Write a script to rip through the file and parse it 10k lines at a time, aggregating as it goes.  When each chunk is done, make it write out a CSV file and import that with LOAD DATA INFILE.  Keep those big fat log files out of the database.  The database is usually the most expensive and hardest-to-scale component in your system -- don't waste resources.</p>
<p>Another benefit of this is the chance to parallelize.  As you know, MySQL doesn't do intra-query parallelization, so ETL jobs written to rely on SQL tend to get really bogged down.  In contrast, moving the processing outside the database lets you parallelize trivially.</p>
<p>If you need to analyze the non-aggregated data, you can store it on the filesystem and write custom scripts to do special-purpose tasks on it.  Storing a little meta-data about each file can help a lot.  Store the ranges of values for various attributes, for example; or the presence or absence of values.  You can put these into the database in a little meta-table.  Then your script can figure out which files it can ignore.  What we're doing here starts to look like a hillbilly version of Infobright, which I'll talk about later.</p>
<p>Alternately, you can store the atomic data as CSV files and use the CSV engine so you have an SQL interface to it (the meta-tables are still a valid approach here!).  This is an easy way to bypass the hard-to-scale database server for the initial insertion, because you can write CSV files with any programming language.  Naturally, CSV files don't store as compactly on disk as [Compressed] MyISAM or Archive.</p>
<p>These are just some ideas I'm throwing around -- the point is to think outside the box, even to think of things that seem "less advanced" than using a database.</p>
<p><strong>Sharding and Partitioning</strong></p>
<p>Sharding is inevitable if your write workload exceeds the capacity of a single server (or if you're using replication, the capacity of a single slave).  Sharding can also help you avoid massive tables that are too big to maintain.  If you know you'll get there, it can change the lifecycle of your application in advance.</p>
<p>What about partitioning in MySQL 5.1?  I know there are some cases when it can help a lot, and we've proven that with our customers.  But you still have to think about how to avoid enormous tables that are hard to maintain, back up, and restore.  And the partitioning functionality is not done yet and not fully integrated into the server, so I expect to find a lot more bugs and annoyances.  There are already inconvenient limitations on some key parts of partitioning, such as maintenance and repair commands, that essentially negate the benefits of partitioning for those operations.  An finally, it doesn't save you from the downtime caused by ALTER TABLE -- a typical reason to think about master-master with failover and failback for maintenance.  As with anything, it's a cost-benefit equation.  What are your priorities?  Choose the solution that meets them.</p>
<p><strong>Be Careful With Data Integrity</strong></p>
<p>When you're storing several levels of aggregation, and there's denormalization, you need to be scrupulous about data cleanliness, because it's really hard to fix things up later.  If your data is coming from a partner site, and you upload bad data there, you'll be getting bad data back for a long time.  And every time you have some incremental job to update the aggregates, you're exposed to that bad data again.</p>
<p>Any inconsistencies in the atomic data tend to get magnified as it gets aggregated, because you suddenly have a single row created from many rows, and if the many rows don't match completely, the single one doesn't know what data should live in it.  And this only gets harder to resolve as you get more levels of aggregations.</p>
<p><strong>Watch Out For The Long Tail</strong></p>
<p>People talk about the long tail and how you can focus on optimizing the short head.  It's the classic 80-20 rule.  Maybe 80% of your ad impressions are on 20% of your ads!  Hooray!  But don't forget that if you're aggregating per-day, an ad that gets a million impressions takes one row, and an ad that gets one impression takes exactly the same: one row.  An impression per day becomes a fixed overhead of storage size.  So, you actually have as many rows as you have unique ads per day.  Viewed this way, suddenly you start to hate the ads that occasionally get an impression.  They're so wasteful!</p>
<p>It's easy to flip back and forth between viewpoints on this and get distracted into making a mistake.  Watch out when you do your capacity planning.  Don't get fooled into calculating the wrong thing.</p>
<p><strong>Be Creative With Table Structures</strong></p>
<p>Suppose you have some yes/no fact about an ad impression, such as whether it was a blue ad (whatever that means.)  You start out with this:</p>
<div class="igBar"><span id="lsql-3"><a href="#" onclick="javascript:showPlainTxt('sql-3'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">SQL:</span>
<div id="sql-3">
<div class="sql">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> ads_by_day_by_blueness <span style="color:#006600; font-weight:bold;">&#40;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; day date <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; ad int <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; is_blue tinyint <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; clicks int <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; impressions int <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">....</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span><span style="color:#006600; font-weight:bold;">&#40;</span>day, ad, is_blue<span style="color:#006600; font-weight:bold;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#41;</span>; </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>What can we improve here?  Especially assuming that there are indexes other than the primary key, we can shrink the primary key's width:</p>
<div class="igBar"><span id="lsql-4"><a href="#" onclick="javascript:showPlainTxt('sql-4'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">SQL:</span>
<div id="sql-4">
<div class="sql">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> ads_by_day_by_blueness <span style="color:#006600; font-weight:bold;">&#40;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; day date <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; ad int <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; clicks int <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; impressions int <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; blue_clicks int <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; blue_impressions int <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span>,</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">....</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span><span style="color:#006600; font-weight:bold;">&#40;</span>day, ad<span style="color:#006600; font-weight:bold;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#41;</span>; </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>There are a couple of ways to handle this now.  You can have the clicks column record the total, and the blue_clicks column record only blue clicks; to find out non-blue clicks you subtract one from the other.  Or you can have the blue clicks and non-blue clicks stored, and to get the totals you add them.</p>
<p>Did this gain us anything?  We dropped one column, and we just moved those other values around to store them "next, to in the same row" instead of "below, in the next row."  So we're storing all the same data, right?</p>
<p>Logically, yes; physically, no.  Those values that we pivoted up beside their neighbors will share a set of primary key columns.  And not only will every index be a little narrower, the table will now contain only half as many rows.  That will make the indexes less than half the size.  In real life this technique often makes the table+index much less than half the size.  You have to write a little more complex queries, but that's often justified by a large reduction in table size.</p>
<p>I sort of stumbled upon this idea one day. I have no idea what this technique might be called, so I call it dog-earing the table (somehow the image of putting columns next to each other makes me think of putting cards next to each other and shoving).</p>
<p><strong>Archive</strong></p>
<p>If you don't need data anymore, move it away or get rid of it.  I wrote a <a href="http://www.xaprb.com/blog/2007/06/13/archive-strategies-for-oltp-servers-part-1/">three-part article on data archiving</a> on my own blog a while back.  The benefits of purging and archiving data can be dramatic.</p>
<p><strong>Take It Easy On Replication</strong></p>
<p>Building aggregated tables is hard work for the database server.  If you do it on the master with INSERT..SELECT queries, it will propagate to the slaves and it'll be hard work there too, assuming you use statement-based replication.</p>
<p>You can save that work by either using MySQL 5.1's row-based replication, or in MySQL 5.0 and earlier, doing the work on a slave, then piping the results back up to the master with LOAD DATA INFILE, which kind of emulates row-based replication in a way.</p>
<p>When you're updating big aggregate tables, don't work with giant chunks of them at once.  If there's any possible way, do it in manageable bits.  A day at a time, for example.</p>
<p>There are a lot of other ways you can make replication faster.  I wrote a lot about this in our book, which is linked from the sidebar above.</p>
<p><strong>Don't Assume Traditional Methods Will Save You</strong></p>
<p>What you're really doing here is building a data warehouse.  So you may think you should use traditional DW methods, like star schemas.  The problem is that MySQL doesn't tend to perform well on a data warehousing workload.  The nested-loop joins are not all that fast on big joins; the query optimizer can sometimes pick bad plans when you have a lot of joins between fact and dimension tables, and so on.  With careful tweaking, many of these things can be overcome, but how much time do you have?  And the gains are simply limited by some of MySQL's weaknesses in some cases.</p>
<p>Not only that, but star schemas are not intended to be fast.  The star schema is essentially "I admit defeat and accept table scans as a fact of life."  <a href="http://www.mysqlperformanceblog.com/2008/04/28/the-mysql-optimizer-the-os-cache-and-sequential-versus-random-io/">Table scans can be better than the alternative</a>, if the alternatives are limited, but they're still not what you need unless you're okay with long queries that read a lot of rows -- MySQL can't handle too many of those at once.</p>
<p>Aside from star schemas, another tactic I see people try a lot is to build "flexible schemas" with tables that contain name-value pairs or something similar.  The thought is that you can make the application believe it has a custom table, which is really constructed behind the scenes from the name-value tables in a complex query with many joins.  I have never seen this approach scale well.</p>
<p><strong>Use The Best Technologies You Can</strong></p>
<p>MySQL is not the end-all and be-all.  If you're familiar with it and it can serve you reasonably well, it's fine to use it for things that it's not 100% optimal for.  But if the costs of doing that are going to outweigh the costs of using another solution, then look at other solutions.</p>
<p>One that holds promise is <a href="http://www.infobright.org/">Infobright</a>.  While I have not evaluated their technology in depth, I think it merits a good look.  I had the chance at <a href="http://www.opensqlcamp.org/">OpenSQL Camp</a> to talk to Alex Esterkin and see him present on it, and based on that exposure, I think they are doing a lot of things right.  When I know enough to have a real opinion (or when other Percona people get to it before I do!) you'll see results on this blog.</p>
<p>Another is <a href="http://www.kickfire.com/">Kickfire</a> -- also something I have not had a chance to properly evaluate.  And there are others, and there will continue to be more.  Finally, <a href="http://www.postgresql.org/">PostgreSQL</a> is clearly better for some workloads out-of-the-box than MySQL is, especially for more complex queries.  Percona is not tied to MySQL, although we're most famous for our knowledge about it.  When another tool is the right one, we use it.</p>
<p>Have you thought about using something besides a database?  You have your choice of buzzwords these days.  Hadoop is a big one.  But beware of falling into the trap of brute-forcing a solution that really needs to be solved with intelligent engineering, instead of massive resources.</p>
<p><strong>Conclusion</strong></p>
<p>This article has been an overview of some of the tactics I've used to successfully scale large click-processing and other types of event-analysis databases.  In some cases I've been able to avoid sharding for a long time and run on many fewer disk drives with much less memory, or even with 10-15x fewer servers.  Clever application design, and a holistic approach, are absolutely necessary.  You can't look to the database to solve everything -- you have to give it all the help you can.  Hopefully it's useful to you, too!</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Baron Schwartz |
      <a href="http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/#comments">9 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/&amp;title=High-Performance Click Analysis with MySQL" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/&amp;title=High-Performance Click Analysis with MySQL" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/&amp;title=High-Performance Click Analysis with MySQL" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/&amp;T=High-Performance Click Analysis with MySQL" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/&amp;title=High-Performance Click Analysis with MySQL" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2008/12/22/high-performance-click-analysis-with-mysql/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>SHOW OPEN TABLES &#8211; what is in your table cache</title>
		<link>http://www.mysqlperformanceblog.com/2008/12/14/show-open-tables-what-is-in-your-table-cache/</link>
		<comments>http://www.mysqlperformanceblog.com/2008/12/14/show-open-tables-what-is-in-your-table-cache/#comments</comments>
		<pubDate>Sun, 14 Dec 2008 23:54:16 +0000</pubDate>
		<dc:creator>peter</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[production]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=561</guid>
		<description><![CDATA[One command, which few people realize exists is SHOW OPEN TABLES  - it allows you to examine what tables do you have open right now:
PLAIN TEXT
SQL:




mysql&#62; SHOW open TABLES FROM test;


+----------+-------+--------+-------------+


&#124; DATABASE &#124; TABLE &#124; In_use &#124; Name_locked &#124;


+----------+-------+--------+-------------+


&#124; test&#160; &#160; &#160;&#124; a&#160; &#160; &#160;&#124;&#160; &#160; &#160; 3 &#124;&#160; &#160; &#160; &#160; &#160; &#160;0 [...]]]></description>
			<content:encoded><![CDATA[<p>One command, which few people realize exists is <strong>SHOW OPEN TABLES </strong> - it allows you to examine what tables do you have open right now:</p>
<div class="igBar"><span id="lsql-6"><a href="#" onclick="javascript:showPlainTxt('sql-6'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">SQL:</span>
<div id="sql-6">
<div class="sql">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">mysql&gt; <span style="color: #993333; font-weight: bold;">SHOW</span> open <span style="color: #993333; font-weight: bold;">TABLES</span> <span style="color: #993333; font-weight: bold;">FROM</span> test;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">+<span style="color: #808080; font-style: italic;">----------+-------+--------+-------------+</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">| <span style="color: #993333; font-weight: bold;">DATABASE</span> | <span style="color: #993333; font-weight: bold;">TABLE</span> | In_use | Name_locked |</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">+<span style="color: #808080; font-style: italic;">----------+-------+--------+-------------+</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">| test&nbsp; &nbsp; &nbsp;| a&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp; <span style="color: #cc66cc;color:#800000;">3</span> |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #cc66cc;color:#800000;">0</span> |</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">+<span style="color: #808080; font-style: italic;">----------+-------+--------+-------------+</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #cc66cc;color:#800000;">1</span> row <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color: #cc66cc;color:#800000;">0</span>.<span style="color: #cc66cc;color:#800000;">00</span> sec<span style="color:#006600; font-weight:bold;">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>This command lists all non-temporary tables in the table-cache, showing each of them only once (even if table is opened more than ones)</p>
<p><strong>In_use </strong> show how many threads are currently using this table, meaning have it locked or waiting on the table lock for this table to lock it.   </p>
<p><strong>Name_locked</strong> shows whenever the name for this table is locked.  It is used for DROP or RENAME TABLE, so you would very rarely see this  field to contain anything else than 0.</p>
<p>Besides just figuring out what tables are in the table_cache this command is rather helpful to understand if there is activity on the given table.  Just run "FLUSH TABLES mytable" and examine open tables later - if you see this table in table cache again chances are it is being used.  </p>
<p>Note however if you're starting MySQL Command line client without "-A" option it opens all tables in the active database to allow tab completion  which can screw results.</p>
<p>Another use for this command is pre-flush implementation (as part of your backup routine) - instead of running FLUSH TABLES on ALL tables one by one you can run SHOW OPEN TABLES and flush only open tables, when run it again and see how many tables are open and in use and if FLUSH TABLES WITH READ LOCK can be run or not. </p>
<p>Unfortunately this command does not really help to answer the very common question you may have during table lock troubleshooting - who is holding lock for this table ?</p>
<p>I would much rather see all entries in the table_cache used, not grouped by the table, with   thread_id using the table set (0 if table is not in use),  lock_type READ/WRITE/READ_LOCAL etc as well as whenever the thread is looking for lock right now. </p>
<p>It also deserves to be converted to INFORMATION_SCHEMA table - so it would be easily to operate it using SQL commands. </p>
<p>Another thing which would be handly is LRU position for the given table (so you can see what tables are candidates for replacement) and the timestamp when this table was locked (or lock wait started) - MySQL anyway initializes the timer so it would not be much overhead to store that time together in the table cache. This could allow to understand  table locks  much better. </p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by peter |
      <a href="http://www.mysqlperformanceblog.com/2008/12/14/show-open-tables-what-is-in-your-table-cache/#comments">2 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2008/12/14/show-open-tables-what-is-in-your-table-cache/&amp;title=SHOW OPEN TABLES &#8211; what is in your table cache" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2008/12/14/show-open-tables-what-is-in-your-table-cache/&amp;title=SHOW OPEN TABLES &#8211; what is in your table cache" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2008/12/14/show-open-tables-what-is-in-your-table-cache/&amp;title=SHOW OPEN TABLES &#8211; what is in your table cache" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2008/12/14/show-open-tables-what-is-in-your-table-cache/&amp;T=SHOW OPEN TABLES &#8211; what is in your table cache" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2008/12/14/show-open-tables-what-is-in-your-table-cache/&amp;title=SHOW OPEN TABLES &#8211; what is in your table cache" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2008/12/14/show-open-tables-what-is-in-your-table-cache/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Thoughs on Innodb Incremental Backups</title>
		<link>http://www.mysqlperformanceblog.com/2008/11/10/thoughs-on-innodb-incremental-backups/</link>
		<comments>http://www.mysqlperformanceblog.com/2008/11/10/thoughs-on-innodb-incremental-backups/#comments</comments>
		<pubDate>Mon, 10 Nov 2008 19:22:48 +0000</pubDate>
		<dc:creator>peter</dc:creator>
				<category><![CDATA[ideas]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=528</guid>
		<description><![CDATA[For  normal Innodb "hot" backups we use LVM or other snapshot based technologies with pretty good success. However having incremental backups remain the problem.   
First why do you need incremental backups at all ? Why not just take the full backups daily.  The answer is space - if you want to [...]]]></description>
			<content:encoded><![CDATA[<p>For  normal Innodb "hot" backups we use LVM or other snapshot based technologies with pretty good success. However having incremental backups remain the problem.   </p>
<p>First why do you need incremental backups at all ? Why not just take the full backups daily.  The answer is space - if you want to keep several generations to be able to restore to, having huge amount of full copies of large database is not efficient. Especially if it only changes couple of percents per day. </p>
<p>The solution MySQL offers - using binary log works in theory but it is not overly useful in practice because  it may take way too long to catch up using binary log. Even if you have very light updates and can execute updates for a full day within an hour it will take over 24 hours to cover month worth of binary logs... and quite typically you would have much higher update traffic.</p>
<p>Another solution is <a href="http://www.nongnu.org/rdiff-backup/">rdiff</a> which is a great general purpose tool.  Though you can do much better with Innodb in Particular.</p>
<p>The Innodb pages have great deal of information helpful for their incremental backup in their internal. There is basically page version allowing to quickly check if the page is newer.  There is page checksum and finally there is an offset of page (where it should be in the data file) stored in the page. </p>
<p>Using this data it should be easy to implement very efficient and yet simple for Incremental backup for Innodb. </p>
<p>In a way similar to rdiff  the tool could both update the backup and store the rollback changes or if dealing with read-only compressed backup create the roll-forward recovery log, which also can be easily compressed.</p>
<p>What tool would need to do is to go through the pages for each Innodb file and simply write all the new pages to the separate file.  Because pages already have position information in them there is no need to have complex "diff" meta data.</p>
<p>For recovery we can simply read this new pages file and put the pages back to their original places. </p>
<p>Of course this means .frm files and Innodb logs and MyISAM system tables need to be copied fully but they typically do not have any considerable portion of Innodb database</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by peter |
      <a href="http://www.mysqlperformanceblog.com/2008/11/10/thoughs-on-innodb-incremental-backups/#comments">8 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2008/11/10/thoughs-on-innodb-incremental-backups/&amp;title=Thoughs on Innodb Incremental Backups" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2008/11/10/thoughs-on-innodb-incremental-backups/&amp;title=Thoughs on Innodb Incremental Backups" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2008/11/10/thoughs-on-innodb-incremental-backups/&amp;title=Thoughs on Innodb Incremental Backups" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2008/11/10/thoughs-on-innodb-incremental-backups/&amp;T=Thoughs on Innodb Incremental Backups" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2008/11/10/thoughs-on-innodb-incremental-backups/&amp;title=Thoughs on Innodb Incremental Backups" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2008/11/10/thoughs-on-innodb-incremental-backups/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Living with backups</title>
		<link>http://www.mysqlperformanceblog.com/2008/11/06/living-with-backups/</link>
		<comments>http://www.mysqlperformanceblog.com/2008/11/06/living-with-backups/#comments</comments>
		<pubDate>Fri, 07 Nov 2008 00:47:38 +0000</pubDate>
		<dc:creator>Maciej Dobrzanski</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[tips]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=479</guid>
		<description><![CDATA[Everyone does backups. Usually it’s some nightly batch job that just dumps all MySQL tables into a text file or ordinarily copies the binary files from the data directory to a safe location. Obviously both ways involve much more complex operations than it would seem by my last sentence, but it is not important right [...]]]></description>
			<content:encoded><![CDATA[<p>Everyone does backups. Usually it’s some nightly batch job that just dumps all MySQL tables into a text file or ordinarily copies the binary files from the data directory to a safe location. Obviously both ways involve much more complex operations than it would seem by my last sentence, but it is not important right now. Either way the data is out and ready to save someone’s life (or job at least). Unfortunately taking backup does not come free of any cost. On the contrary, it’s more like doing very heavy queries against each table in the database when mysqldump is used or reading a lot of data when copying physical files, so the price may actually be rather high. And the more effectively the server resources are utilized, the more that becomes a problem.</p>
<h4>What happens when you try to get all the data?</h4>
<p>The most obvious answer is that it needs to be read, through I/O requests, from a storage that it resides on. The storage is handling reads issued by the system, but only at an extremely limited rate. So when a task is reading a lot of data very quickly, just as the archiving process does when it runs, it causes a huge number of requests being pushed down to the disks and saturating the I/O quite easily. Naturally at the same time the database needs to perform all those regular tasks like serving queries, using and competing for the very same disks to read or write whatever comes so that your favorite website can still show up in the browser. Moreover, reads sent from backup process usually want many sequential blocks of data and such access pattern may be preferred by the I/O scheduler over random I/O coming from MySQL, but also such large I/O requests take significant time to complete and the way typical disks work prevents anything else being executed in the mean time. And so database often needs to wait much longer until disk operations are scheduled and executed which converts into slower queries execution and significantly degraded performance.</p>
<h4>Anything else?</h4>
<p>All modern systems usually use caching of whatever is being read from a storage. It allows to reduce I/O to such devices on frequently accessed information. After a successful read the block of data is placed in the cache and then served only from the memory should anything ask for the same block again. That happens for as long as it does not get flushed. MySQL obviously takes advantage of this functionality just as any other application and this is especially true for MyISAM tables which have dedicated buffers only to store indexes, while the actual data is always read from disk. The active portions of tables will likely be placed by the system in memory and kept there for a long time. Since memory access is way faster than any disk access, even with the fastest drives or RAID configurations, the performance gains are quite clear. Now going back to flushing the data out of the cache. It happens by replacing old and unused blocks with the newly read ones. And so the more new blocks come, the more older ones need to go away. Just imagine what happens to all the cached data during backup run when the process is reading several times more information than there is physical memory installed on the server - it's not difficult to have a database of such size. Everything is wiped out and replaced by random "garbage" for no good reason. Since the hit ratio becomes worse as the cache is being filled with random information additional I/O occur slowing everything down even further.</p>
<h4>What does it all mean?</h4>
<p>There is a clear conflict between the regular database activities, which need fast response times, and doing backups which would gladly act as in all-you-can-eat bar. With a busy database server it may turn out that running a simple copy operation from MySQL data directory will result in a total disaster after MySQL stops responding to the incoming queries quick enough.</p>
<h4>What can be done about it?</h4>
<p>Nothing really when you are using mysqldump. You can play some tricks, but it’s mostly beyond your control.</p>
<p align="center">&nbsp;</p>
<p>Otherwise, when copying physical data files, in order not to saturate the I/O you can limit the rate at which data is being read. This is unfortunately not something you can do with standard Linux tools like <em>cp</em>, <em>scp</em> or <em>tar</em>. However for example <em>rsync</em> has the possibility to do that with <strong>--bwlimit=KBPS</strong> option. The problem with <em>rsync</em> is that it needs to build a list of files to transfer before it can take any action and this operation alone is often quite heavy on I/O and is not subject to any limits.</p>
<p>Some time ago we prepared <a target=_blank href="http://www.mysqlperformanceblog.com/files/patches/tar-1.19-io-throttle.patch">a patch for <em>tar</em></a> that implements <strong>--read-rate=BytesPerSecond</strong>. In this case the advantage from using <em>tar</em> over <em>rsync</em> is that you can immediately, on the fly, create a compressed archive. For example:</p>
<div class="igBar"><span id="lcode-12"><a href="#" onclick="javascript:showPlainTxt('code-12'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-12">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">/root/backup-tools/tar --read-rate=<span style="color:#800000;color:#800000;">15000000</span> -C /mnt/snapshot -c -z -v -f - mysql | ssh backup@storage-host /root/backup-tools/write_backup.<span style="">sh</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>This will read /mnt/snapshot/mysql directory at 15000000 bytes/s creating a TAR/GZ archive out of it and printing it out to the standard output, which is then is redirected through the pipe over SSH to a remote host where a script reads the standard input and writes the archive into a proper location (where would we be without one-liners)</p>
<p align="center">&nbsp;</p>
<p>On Linux there is also a utility called <i>ionice</i>. It allows to affect how I/O scheduler will be dealing with I/O request coming from a certain process. Giving the backup application low class or priority will cause it won't be getting in the way of the database work so much.</p>
<div class="igBar"><span id="lcode-13"><a href="#" onclick="javascript:showPlainTxt('code-13'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-13">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">ionice -c3 /root/backup-tools/tar ...</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="">ionice</span> -c2 -n7 /root/backup-tools/tar ... </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>Please cosult <i>ionice</i> man page for usage details, it's really very simple to grasp and use. The restriction for this application to work is that the system must be using CFQ elevator algorithm, it does not work with others. But that is not really a problem since most modern systems run on CFQ by default and even if not, you can change in at runtime anyway. To check what is the current setting you need to query your block devices. In case of the SCSI sub-system (devices named sda, sdb, sdc, etc.) that can be done with:</p>
<div class="igBar"><span id="lcode-14"><a href="#" onclick="javascript:showPlainTxt('code-14'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-14">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"># cat /sys/block/sd?/queue/scheduler</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">noop anticipatory <span style="color:#006600; font-weight:bold;">&#91;</span>deadline<span style="color:#006600; font-weight:bold;">&#93;</span> cfq </div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">noop anticipatory <span style="color:#006600; font-weight:bold;">&#91;</span>deadline<span style="color:#006600; font-weight:bold;">&#93;</span> cfq </div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">noop anticipatory <span style="color:#006600; font-weight:bold;">&#91;</span>deadline<span style="color:#006600; font-weight:bold;">&#93;</span> cfq </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>In order to change you will just need to write to <i>scheduler</i> files with the name of the new scheduler:</p>
<div class="igBar"><span id="lcode-15"><a href="#" onclick="javascript:showPlainTxt('code-15'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-15">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"># for device in `ls /sys/block` ; do if <span style="color:#006600; font-weight:bold;">&#91;</span> -f /sys/block/$device/queue/scheduler <span style="color:#006600; font-weight:bold;">&#93;</span> ; then \</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;echo <span style="color:#CC0000;">"cfq"</span>&gt; /sys/block/$device/queue/scheduler ; fi ; done </div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"># cat /sys/block/sd?/queue/scheduler&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">noop anticipatory deadline <span style="color:#006600; font-weight:bold;">&#91;</span>cfq<span style="color:#006600; font-weight:bold;">&#93;</span> </div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">noop anticipatory deadline <span style="color:#006600; font-weight:bold;">&#91;</span>cfq<span style="color:#006600; font-weight:bold;">&#93;</span> </div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">noop anticipatory deadline <span style="color:#006600; font-weight:bold;">&#91;</span>cfq<span style="color:#006600; font-weight:bold;">&#93;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>That's it, now you can enjoy experimenting with <i>ionice</i>.</p>
<p align="center">&nbsp;</p>
<p>It could be also possible to make a backup application that would not interfere with the system cache. By specifying O_DIRECT flag when opening a file, an application tells the system to bypass the cache for it during reads. This is so far only an idea since there are no tools I know of that would support it well. The problem is when accessing a file that was opened with O_DIRECT flag, the file needs to be aligned to the file system block size, so usually it means the size has to be divisible by 4096 in order to read it right. Even though it’s always the case for InnoDB tablespaces, other MySQL data files do not comply with this requirement. The trick could be used here perhaps to read the file with O_DIRECT up to the last full block and then only perform a regular cached read on the last few bytes and append them to the target file.</p>
<p align="center">&nbsp;</p>
<p>But even if all those precautions have been taken, there are still chances for performance problems to happen on the working instance of MySQL. Such danger may for example come from an unexpected spike in load or traffic, or even from quite expected spikes that you simply can't do anything about. So what I thought could be done here as the next step was to constantly monitor the database status and if any problems were noticed, the monitoring would simply pause the copying. I did a simple Perl script to do just that. It works by sending signals that can either stop or resume the application that copies the data:</p>
<div class="igBar"><span id="lcode-16"><a href="#" onclick="javascript:showPlainTxt('code-16'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-16">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">use POSIX <span style="color:#CC0000;">":sys_wait_h"</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">use DBI;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">$pid= fork<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">if <span style="color:#006600; font-weight:bold;">&#40;</span>$pid == <span style="color:#800000;color:#800000;">0</span><span style="color:#006600; font-weight:bold;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; exec<span style="color:#006600; font-weight:bold;">&#40;</span>‘/root/backup-tools/tar --read-rate=<span style="color:#800000;color:#800000;">15000000</span> …’<span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">$was_running= <span style="color:#800000;color:#800000;">1</span>;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">$is_running= <span style="color:#800000;color:#800000;">1</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">while<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#800000;color:#800000;">1</span><span style="color:#006600; font-weight:bold;">&#41;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#123;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; if <span style="color:#006600; font-weight:bold;">&#40;</span>$is_running == <span style="color:#800000;color:#800000;">0</span> &amp;&amp; $was_running == <span style="color:#800000;color:#800000;">1</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> kill <span style="color:#800000;color:#800000;">19</span>, $pid; $was_running= <span style="color:#800000;color:#800000;">0</span>; <span style="color:#006600; font-weight:bold;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; if <span style="color:#006600; font-weight:bold;">&#40;</span>$is_running == <span style="color:#800000;color:#800000;">1</span> &amp;&amp; $was_running == <span style="color:#800000;color:#800000;">0</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> kill <span style="color:#800000;color:#800000;">18</span>, $pid; $was_running= <span style="color:#800000;color:#800000;">1</span>; <span style="color:#006600; font-weight:bold;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; …</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; my $sth= $dbh-&gt;<span style="">prepare</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#CC0000;">'SHOW GLOBAL STATUS LIKE &quot;Threads_connected&quot;'</span><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; …</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; if <span style="color:#006600; font-weight:bold;">&#40;</span>$$row<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#800000;color:#800000;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span> &lt;<span style="color:#800000;color:#800000;">10</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> $is_running= <span style="color:#800000;color:#800000;">1</span>; <span style="color:#006600; font-weight:bold;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; if <span style="color:#006600; font-weight:bold;">&#40;</span>$$row<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#800000;color:#800000;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span>&gt; <span style="color:#800000;color:#800000;">50</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> $is_running= <span style="color:#800000;color:#800000;">0</span>; <span style="color:#006600; font-weight:bold;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; …</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; $kid = waitpid<span style="color:#006600; font-weight:bold;">&#40;</span>$pid, WNOHANG<span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; if <span style="color:#006600; font-weight:bold;">&#40;</span>$kid&gt; <span style="color:#800000;color:#800000;">0</span><span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#006600; font-weight:bold;">&#123;</span> last; <span style="color:#006600; font-weight:bold;">&#125;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; sleep<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#800000;color:#800000;">1</span><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#125;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>What this script does is to check MySQL status every second and in case the number of connected threads goes above 50, it sends STOP signal to the archiving process. Whenever the number of connected threads drops down back to 9 or less, the script sends CONT signal which resumes archiving. The levels are of course different for every MySQL instance, these are just examples. The cheks can also be more sphisticated or include things like processlist information, CPU load avergages, I/O load, etc.</p>
<p>In this case <i>tar</i> is actually launched by the monitoring script directly, but that’s not really necessary. It simply needs to know PID of the process to manage and have a way to know when it ends. </p>
<p>This is of course just a concept of what I'm using successfully in some difficut environments and you can try building a mechanism suiting your own needs based on that.</p>
<p>Maciek</p>
<p>P.S. If you know someone who does not care about backups, please let him know <a target=_blank href="http://www.percona.com/services/data-recovery-services-mysql.html">this URL</a> for our data recovery services.</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Maciej Dobrzanski |
      <a href="http://www.mysqlperformanceblog.com/2008/11/06/living-with-backups/#comments">23 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2008/11/06/living-with-backups/&amp;title=Living with backups" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2008/11/06/living-with-backups/&amp;title=Living with backups" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2008/11/06/living-with-backups/&amp;title=Living with backups" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2008/11/06/living-with-backups/&amp;T=Living with backups" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2008/11/06/living-with-backups/&amp;title=Living with backups" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2008/11/06/living-with-backups/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Development plans</title>
		<link>http://www.mysqlperformanceblog.com/2008/09/08/development-plans/</link>
		<comments>http://www.mysqlperformanceblog.com/2008/09/08/development-plans/#comments</comments>
		<pubDate>Mon, 08 Sep 2008 23:22:31 +0000</pubDate>
		<dc:creator>Vadim</dc:creator>
				<category><![CDATA[Innodb]]></category>
		<category><![CDATA[ideas]]></category>
		<category><![CDATA[percona]]></category>
		<category><![CDATA[release]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=491</guid>
		<description><![CDATA[We gathered  together our ideas of MySQL improvements on this page http://www.percona.com/percona-lab/dev-plan.html
and we are going to implement some of them.
My favorite one is - make InnoDB files .ibd (one created with --innodb-file-per-table=1) movable from one server to another, however it is sort of challenging.
Probably next one patch we want to integrate is Google's smp-fix [...]]]></description>
			<content:encoded><![CDATA[<p>We gathered  together our ideas of MySQL improvements on this page <a href="http://www.percona.com/percona-lab/dev-plan.html">http://www.percona.com/percona-lab/dev-plan.html</a><br />
and we are going to implement some of them.<br />
My favorite one is - make InnoDB files .ibd (one created with --innodb-file-per-table=1) movable from one server to another, however it is sort of challenging.<br />
Probably next one patch we want to integrate is Google's <a href="http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches">smp-fix </a> or Yasufumi's <a href="http://bugs.mysql.com/bug.php?id=26442">rw-locks</a> (we are going to test both before)</p>
<p>On this page <a href="http://www.percona.com/percona-lab.html">http://www.percona.com/percona-lab.html</a> you can find links to our current binaries and patches.</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Vadim |
      <a href="http://www.mysqlperformanceblog.com/2008/09/08/development-plans/#comments">5 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2008/09/08/development-plans/&amp;title=Development plans" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2008/09/08/development-plans/&amp;title=Development plans" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2008/09/08/development-plans/&amp;title=Development plans" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2008/09/08/development-plans/&amp;T=Development plans" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2008/09/08/development-plans/&amp;title=Development plans" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2008/09/08/development-plans/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Missing Data &#8211; rows used to generate result set</title>
		<link>http://www.mysqlperformanceblog.com/2008/07/20/missing-data-rows-used-to-generate-result-set/</link>
		<comments>http://www.mysqlperformanceblog.com/2008/07/20/missing-data-rows-used-to-generate-result-set/#comments</comments>
		<pubDate>Sun, 20 Jul 2008 17:01:06 +0000</pubDate>
		<dc:creator>peter</dc:creator>
				<category><![CDATA[ideas]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=439</guid>
		<description><![CDATA[As Baron writes  it is not the number of rows returned by the query  but number of rows accessed by the query will most likely be defining query performance.   Of course not all row accessed are created equal  (such as full table scan row accesses may be much faster than [...]]]></description>
			<content:encoded><![CDATA[<p>As <a href="http://www.xaprb.com/blog/2008/06/28/mysql-challenge-limit-rows-accessed-not-rows-returned/">Baron writes </a> it is not the number of rows returned by the query  but number of rows accessed by the query will most likely be defining query performance.   Of course not all row accessed are created equal  (such as full table scan row accesses may be much faster than random index lookups row accesses in the same table) but this is very valuable data point to optimize query anyway.</p>
<p>The question of optimizing number of rows accessed is what would be the optimal number indicating query is typically well optimized ?   Of course in the perfect world we would like to see <em>rows returned</em> = <em>rows analyzed</em>.  though this is only possible to reach for small fraction of queries. </p>
<p>If you're joining multiple tables or if you have GROUP BY query  the number of rows which need to be utilized to create the result set will be larger than number of rows returned. </p>
<p>What I would like to see (for example as another slow query log record)  is the number of rows which MySQL used to generate result set.  Comparing this number with number of rows query actually accessed we can guess (what is important automatically !) there is potential for optimizing this query.</p>
<p>For example: </p>
<p><strong>SELECT GENDER, COUNT(*)  FROM PEOPLE GROUP BY GENDER</strong></p>
<p>This query will return only couple of rows but it is clear all rows from the table were used to generate result set and it is not possible to optimize this query directly to only access couple of rows (though this gives us another idea to possibly keep cache table with couple of rows in it)</p>
<p>Now if we have the same table with no indexes and query</p>
<p><strong>SELECT GENDER, COUNT(*)  FROM PEOPLE WHERE COUNTRY='USA' GROUP BY GENDER</strong></p>
<p>even though full table scan is performed only rows with COUNTRY='USA' are used in results set which clearly puts query as optimization candidate.</p>
<p>It is not always possible to optimize queries so the number of rows accessed is same as number of rows used to generate result set - for example any filter which can't use indexes will make these number different, though such filter will be suboptimal and you may think how to fix the situation.</p>
<p>For example if you have clause like   <em>TITLE LIKE "%MYSQL%" </em> you may instead use Full Text Search indexes.  If you have   <em>WHERE ID%100=0</em>  you can have extra column divisible_by_hundred and keep it indexed.    Of course in all cases there is extra cost involved and you should weight if it make sense to optimize such queries.  I'm just describing the possibility.</p>
<p>Sounds nice as described right ?  Unfortunately it is not that easy to implement it in the general sense as you can't always  track the future of individual row. Queries with temporary result set are especially complicated, for example:</p>
<p><strong>SELECT * FROM (SELECT COUNTRY,COUNT(*) FROM PEOPLE GROUP BY COUNTRY) C WHERE COUNTRY='USA'</strong></p>
<p>As of MySQL 5.0   MySQL will materialize the subquery in the from clause fully and so "use" all rows in result set while in reality only fraction of them will be needed for end result set as most of the groups are filtered out.     There are many similar cases when decision of whenever row is used for result set or not is taken long after it stop existed as individual row which just was accessed.</p>
<p>At the same time I think starting with something and covering basic "single level" queries keeping in account JOINs, GROUP BY,  LIMIT would already be helpful for many cases.</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by peter |
      <a href="http://www.mysqlperformanceblog.com/2008/07/20/missing-data-rows-used-to-generate-result-set/#comments">15 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2008/07/20/missing-data-rows-used-to-generate-result-set/&amp;title=Missing Data &#8211; rows used to generate result set" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2008/07/20/missing-data-rows-used-to-generate-result-set/&amp;title=Missing Data &#8211; rows used to generate result set" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2008/07/20/missing-data-rows-used-to-generate-result-set/&amp;title=Missing Data &#8211; rows used to generate result set" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2008/07/20/missing-data-rows-used-to-generate-result-set/&amp;T=Missing Data &#8211; rows used to generate result set" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2008/07/20/missing-data-rows-used-to-generate-result-set/&amp;title=Missing Data &#8211; rows used to generate result set" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2008/07/20/missing-data-rows-used-to-generate-result-set/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Neat tricks for the MySQL command-line pager</title>
		<link>http://www.mysqlperformanceblog.com/2008/06/23/neat-tricks-for-the-mysql-command-line-pager/</link>
		<comments>http://www.mysqlperformanceblog.com/2008/06/23/neat-tricks-for-the-mysql-command-line-pager/#comments</comments>
		<pubDate>Tue, 24 Jun 2008 02:57:39 +0000</pubDate>
		<dc:creator>Baron Schwartz</dc:creator>
				<category><![CDATA[ideas]]></category>
		<category><![CDATA[tips]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/?p=420</guid>
		<description><![CDATA[How many of you use the mysql command-line client?  And did you know about the pager command you can give it?  It's pretty useful.  It tells mysql to pipe the output of your commands through the specified program before displaying it to you.
Here's the most basic thing I can think of to do with it: [...]]]></description>
			<content:encoded><![CDATA[<p>How many of you use the mysql command-line client?  And did you know about the pager command you can give it?  It's pretty useful.  It tells mysql to pipe the output of your commands through the specified program before displaying it to you.</p>
<p>Here's the most basic thing I can think of to do with it: use it as a pager.  (It's scary how predictable I am sometimes, isn't it?)</p>
<div class="igBar"><span id="lsql-21"><a href="#" onclick="javascript:showPlainTxt('sql-21'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">SQL:</span>
<div id="sql-21">
<div class="sql">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">mysql&gt; pager less</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">mysql&gt; <span style="color: #993333; font-weight: bold;">SHOW</span> innodb <span style="color: #993333; font-weight: bold;">STATUS</span>\G </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>For big result sets, it's a pretty handy way to be able to search and scroll through.  No mouse required, of course.</p>
<p>But it doesn't have to be this simple!  You can specify anything you want as a pager.  Hmm, you know what that means?  It means you can write your own script and push the output through it.  <del datetime="2008-06-24T11:14:32+00:00">You can't specify arguments to the script, but since you can write your own, that's not really a limitation.</del>(Edit: I'm wrong!  You can.  See Giuseppe's comment below.)  For example, here's a super-simple script that will show the lock waits in the output of SHOW INNODB STATUS.  Save this file as /tmp/lock_waits and make it executable.</p>
<div class="igBar"><span id="lcode-22"><a href="#" onclick="javascript:showPlainTxt('code-22'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">CODE:</span>
<div id="code-22">
<div class="code">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">#!/bin/sh</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">grep -A <span style="color:#800000;color:#800000;">1</span> <span style="color:#CC0000;">'TRX HAS BEEN WAITING'</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>Now in your mysql session, set /tmp/lock_waits as your pager and let's see if there are any lock waits:</p>
<div class="igBar"><span id="lsql-23"><a href="#" onclick="javascript:showPlainTxt('sql-23'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">SQL:</span>
<div id="sql-23">
<div class="sql">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">mysql&gt; pager /tmp/lock_waits</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">PAGER <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #993333; font-weight: bold;">TO</span> <span style="color: #ff0000;">'/tmp/lock_waits'</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">mysql&gt; <span style="color: #993333; font-weight: bold;">SHOW</span> innodb <span style="color: #993333; font-weight: bold;">STATUS</span>\G</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #808080; font-style: italic;">------- TRX HAS BEEN WAITING 50 SEC FOR THIS LOCK TO BE GRANTED:</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">RECORD LOCKS space id <span style="color: #cc66cc;color:#800000;">0</span> page no <span style="color: #cc66cc;color:#800000;">52</span> n bits <span style="color: #cc66cc;color:#800000;">72</span> <span style="color: #993333; font-weight: bold;">INDEX</span> <span style="color: #ff0000;">`GEN_CLUST_INDEX`</span> of <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #ff0000;">`test/t`</span> trx id <span style="color: #cc66cc;color:#800000;">0</span> <span style="color: #cc66cc;color:#800000;">14615</span> lock_mode X waiting</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #cc66cc;color:#800000;">1</span> row <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span>, <span style="color: #cc66cc;color:#800000;">1</span> warning <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color: #cc66cc;color:#800000;">0</span>.<span style="color: #cc66cc;color:#800000;">00</span> sec<span style="color:#006600; font-weight:bold;">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>Pretty useful, isn't it?  But we can do even more.  For example, the <a href="http://www.maatkit.org/">Maatkit</a> tools are specifically designed to be useful at the command line in the traditional Unix pipe-and-filter manner.  What sort of goodies can we think of here?</p>
<div class="igBar"><span id="lsql-24"><a href="#" onclick="javascript:showPlainTxt('sql-24'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">SQL:</span>
<div id="sql-24">
<div class="sql">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">mysql&gt; pager mk-visual-<span style="color: #993333; font-weight: bold;">EXPLAIN</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">PAGER <span style="color: #993333; font-weight: bold;">SET</span> <span style="color: #993333; font-weight: bold;">TO</span> <span style="color: #ff0000;">'mk-visual-explain'</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">mysql&gt; <span style="color: #993333; font-weight: bold;">EXPLAIN</span> <span style="color: #993333; font-weight: bold;">SELECT</span> * <span style="color: #993333; font-weight: bold;">FROM</span> sakila.film <span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> sakila.film_actor <span style="color: #993333; font-weight: bold;">USING</span><span style="color:#006600; font-weight:bold;">&#40;</span>film_id<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color: #993333; font-weight: bold;">INNER</span> <span style="color: #993333; font-weight: bold;">JOIN</span> sakila.actor <span style="color: #993333; font-weight: bold;">USING</span><span style="color:#006600; font-weight:bold;">&#40;</span>actor_id<span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #993333; font-weight: bold;">JOIN</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">+- Bookmark lookup</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">|&nbsp; +- <span style="color: #993333; font-weight: bold;">TABLE</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">|&nbsp; |&nbsp; <span style="color: #993333; font-weight: bold;">TABLE</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; actor</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">|&nbsp; |&nbsp; possible_keys&nbsp; <span style="color: #993333; font-weight: bold;">PRIMARY</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">|&nbsp; +- <span style="color: #993333; font-weight: bold;">UNIQUE</span> <span style="color: #993333; font-weight: bold;">INDEX</span> lookup</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">|&nbsp; &nbsp; &nbsp;<span style="color: #993333; font-weight: bold;">KEY</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; actor-&gt;PRIMARY</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">|&nbsp; &nbsp; &nbsp;possible_keys&nbsp; <span style="color: #993333; font-weight: bold;">PRIMARY</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">|&nbsp; &nbsp; &nbsp;key_len&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #cc66cc;color:#800000;">2</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">|&nbsp; &nbsp; &nbsp;ref&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sakila.film_actor.actor_id</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">|&nbsp; &nbsp; &nbsp;rows&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #cc66cc;color:#800000;">1</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">+- <span style="color: #993333; font-weight: bold;">JOIN</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;+- Bookmark lookup</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;|&nbsp; +- <span style="color: #993333; font-weight: bold;">TABLE</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;|&nbsp; |&nbsp; <span style="color: #993333; font-weight: bold;">TABLE</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; film_actor</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;|&nbsp; |&nbsp; possible_keys&nbsp; <span style="color: #993333; font-weight: bold;">PRIMARY</span>,idx_fk_film_id</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;|&nbsp; +- <span style="color: #993333; font-weight: bold;">INDEX</span> lookup</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;|&nbsp; &nbsp; &nbsp;<span style="color: #993333; font-weight: bold;">KEY</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; film_actor-&gt;idx_fk_film_id</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;|&nbsp; &nbsp; &nbsp;possible_keys&nbsp; <span style="color: #993333; font-weight: bold;">PRIMARY</span>,idx_fk_film_id</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;|&nbsp; &nbsp; &nbsp;key_len&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #cc66cc;color:#800000;">2</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;|&nbsp; &nbsp; &nbsp;ref&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sakila.film.film_id</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;|&nbsp; &nbsp; &nbsp;rows&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #cc66cc;color:#800000;">2</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp;+- <span style="color: #993333; font-weight: bold;">TABLE</span> scan</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; rows&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #cc66cc;color:#800000;">1022</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; +- <span style="color: #993333; font-weight: bold;">TABLE</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #993333; font-weight: bold;">TABLE</span>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; film</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;possible_keys&nbsp; <span style="color: #993333; font-weight: bold;">PRIMARY</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color: #cc66cc;color:#800000;">3</span> rows <span style="color: #993333; font-weight: bold;">IN</span> <span style="color: #993333; font-weight: bold;">SET</span> <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color: #cc66cc;color:#800000;">0</span>.<span style="color: #cc66cc;color:#800000;">00</span> sec<span style="color:#006600; font-weight:bold;">&#41;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>Now, that's handy.</p>
<p>What are your favorite ideas?</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by Baron Schwartz |
      <a href="http://www.mysqlperformanceblog.com/2008/06/23/neat-tricks-for-the-mysql-command-line-pager/#comments">9 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2008/06/23/neat-tricks-for-the-mysql-command-line-pager/&amp;title=Neat tricks for the MySQL command-line pager" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2008/06/23/neat-tricks-for-the-mysql-command-line-pager/&amp;title=Neat tricks for the MySQL command-line pager" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2008/06/23/neat-tricks-for-the-mysql-command-line-pager/&amp;title=Neat tricks for the MySQL command-line pager" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2008/06/23/neat-tricks-for-the-mysql-command-line-pager/&amp;T=Neat tricks for the MySQL command-line pager" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2008/06/23/neat-tricks-for-the-mysql-command-line-pager/&amp;title=Neat tricks for the MySQL command-line pager" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2008/06/23/neat-tricks-for-the-mysql-command-line-pager/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Idea: Couple of more string types</title>
		<link>http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/</link>
		<comments>http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/#comments</comments>
		<pubDate>Fri, 18 Apr 2008 23:43:47 +0000</pubDate>
		<dc:creator>peter</dc:creator>
				<category><![CDATA[ideas]]></category>

		<guid isPermaLink="false">http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/</guid>
		<description><![CDATA[MySQL has a lot of string data types - CHAR, VARCHAR, BLOB, TEXT, ENUM and bunch of variants such as VARBINARY but I think it is not enough  
I would also like to see type HEXCHAR which would be able to store hex strings, such as those returned as MD5() and SHA1() efficiently.  [...]]]></description>
			<content:encoded><![CDATA[<p>MySQL has a lot of string data types - <strong>CHAR</strong>, <strong>VARCHAR</strong>, <strong>BLOB</strong>, <strong>TEXT</strong>, <strong>ENUM</strong> and bunch of variants such as <strong>VARBINARY</strong> but I think it is not enough <img src='http://www.mysqlperformanceblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I would also like to see type <strong>HEXCHAR</strong> which would be able to store hex strings, such as those returned as <strong>MD5()</strong> and <strong>SHA1()</strong> efficiently.  With little modification it could work for <strong>UUID()</strong> as well (it adds some dashes).   Currently it is quite inconvenient to deal with strings like that in MySQL.   Either you store them as strings and waste space or you spend them as binary and deal with inconvenience of having not readable strings in the table OR adding  <strong>UNHEX()</strong> everywhere - which also adds overhead.</p>
<p>Another one I would like to see is<strong> zBLOB</strong> or <strong>zTEXT</strong>   (or call them BLOB COMPRESSED/ TEXT COMPRESSED) which would  transparently compress the blobs  when they are inserted and retrieved from the database - this would allow to avoid having <strong>COMPRESS()</strong>/<strong>UNCOMPRESS()</strong> everywhere which clobbers things or compressing/uncompressing on the client. </p>
<p>It would be best if last one is optimized so if BLOB is not used in any WHERE clause (HAVING, GROUP BY etc) you could actually transparently decompress it on the client and compress bad. Though this is likely to require more significant changes in MySQL so I would not expect to happen quickly.    The basic support should not be that hard though.</p>
    <hr noshade style="margin:0;height:1px" />
    <p>Entry posted by peter |
      <a href="http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/#comments">12 comments</a></p>
    <p>Add to: <a href="http://del.icio.us/post?url=http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/&amp;title=Idea: Couple of more string types" title="Bookmark this post on del.icio.us"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/delicious.png" alt="delicious" /></a> | <a href="http://digg.com/submit?phase=2&amp;url=http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/&amp;title=Idea: Couple of more string types" title="Digg this post on Digg.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/digg.png" alt="digg" /></a> | <a href="http://reddit.com/submit?url=http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/&amp;title=Idea: Couple of more string types" title="Submit this post on reddit.com"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/reddit.png" alt="reddit" /></a> | <a href="http://www.netscape.com/submit/?U=http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/&amp;T=Idea: Couple of more string types" title="Vote for this article on Netscape"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/netscape.gif" alt="netscape" /></a> | <a href="http://www.google.com/bookmarks/mark?op=add&amp;bkmk=http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/&amp;title=Idea: Couple of more string types" title="Add to Google Bookmarks"><img src="http://www.mysqlperformanceblog.com/wp-content/themes/boxy-but-gold/images/google.png" alt="Google Bookmarks" /></a></p>]]></content:encoded>
			<wfw:commentRss>http://www.mysqlperformanceblog.com/2008/04/18/idea-couple-of-more-string-types/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
	</channel>
</rss>
