MySQL Performance Blog http://www.mysqlperformanceblog.com Everything about MySQL Performance Fri, 06 Nov 2009 06:21:03 +0000 http://wordpress.org/?v=2.8.4 en hourly 1 Air traffic queries in MyISAM and Tokutek (TokuDB) http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/ http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/#comments Fri, 06 Nov 2009 06:21:03 +0000 Vadim http://www.mysqlperformanceblog.com/?p=1641 This is next post in series
Analyzing air traffic performance with InfoBright and MonetDB
Air traffic queries in LucidDB
Air traffic queries in InfiniDB: early alpha

Let me explain the reason of choosing these engines. After initial three posts I am often asked "What is baseline ? Can we compare results with standard MySQL engines ?". So there come MyISAM to consider it as base point to see how column-oriented-analytic engines are better here.

However, take into account, that for MyISAM we need to choose proper indexes to execute queries effectively, and there is pain coming with indexes: - load of data is getting slower; - to design proper indexes is additional research, especially when MySQL optimizer is not smart in picking best one.

The really nice thing about MonetDB, InfoBright, InfiniDB is that they do not need indexes, so you may not worry about maintaining them and picking best one. I am not sure about LucidDB, I was told indexes are needed, but creating new index was really fast even on full database, so I guess, it's not B-Tree indexes. So this my reflexion on indexes turned me onto TokuDB direction.

What is so special about TokuDB ? There two things: indexes have special structure and are "cheap", by "cheap" I mean the maintenance cost is constant and independent on datasize. With regular B-Tree indexes cost grows exponentially on datasize (Bradley Kuszmaul from Tokutek will correct me if I am wrong in this statement). Another point with TokuDB, it uses compression, so I expect less size of loaded data and less IO operations during query execution.

So what indexes we need for queries. To recall you details, the schema is available in this post
http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/, and
queries I posted on sheet "Queries" in my summary Spreadsheet.

With Bradley's help we chose next indexes:

CODE:
  1. KEY `Year` (`Year`,`Month`),
  2.   KEY `Year_2` (`Year`,`DayOfWeek`),
  3.   KEY `DayOfWeek` (`DayOfWeek`,`Year`,`DepDelay`),
  4.   KEY `DestCityName` (`DestCityName`,`OriginCityName`,`Year`),
  5.   KEY `Year_3` (`Year`,`DestCityName`,`OriginCityName`),
  6.   KEY `Year_4` (`Year`,`Carrier`,`DepDelay`),
  7.   KEY `Origin` (`Origin`,`Year`,`DepDelay`)

And I measured load time for both MyISAM and TokuDB in empty table with created indexes.

Load time for MyISAM: 16608 sec
For TokuDB: 19131 sec

Datasize (including indexes)

MyISAM: 36.7GB
TokuDB: 6.7GB

I am a bit surprised that TokuDB is slower loading data, but my guess it is related to compression, and I expect with bigger amount of data TokuDB will be faster MyISAM.

Now to queries. Bradley pointed me that query Q5 SELECT t.carrier, c, c2, c*1000/c2 as c3 FROM (SELECT carrier,
count(*) AS c FROM ontime WHERE DepDelay>10 AND Year=2007 GROUP BY
carrier) t JOIN (SELECT carrier, count(*) AS c2 FROM ontime WHERE
Year=2007 GROUP BY carrier) t2 ON (t.Carrier=t2.Carrier) ORDER BY c3
can be rewritten as
SELECT carrier,totalflights,ndelayed,ndelayed*1000/totalflights as c3 FROM (SELECT carrier,count(*) as totalflights,sum(if(depdelay>10,1,0)) as ndelayed from ontime where year=2007 group by carrier) t order by c3 desc; ( I name it as Query Q5i)

The summary table with queries execution time (in sec, less is better):

Query MyISAM TokuDB
Q0 72.84 50.25
Q1 61.03 55.01
Q2 98.12 58.36
Q3 123.04 66.87
Q4 6.92 6.91
Q5 13.61 11.86
Q5i 7.68 6.96
Q6 123.84 69.03
Q7 187.22 159.62
Q8 (1y) 8.75 7.59
Q8 (2y) 102.17 64.95
Q8 (3y) 104.7 69.76
Q8 (4y) 107.05 70.46
Q8 (10y) 119.54 84.64
Q9 69.05 47.67

For reference I used 5.1.36-Tokutek-2.1.0 for both MyISAM and TokuDB tests.

And if you are interested to compare MyISAM with previous engines:

Query MyISAM MonetDB InfoBright LucidDB InfiniDB
Q0 72.84 29.9 4.19 103.21 NA
Q1 61.03 7.9 12.13 49.17 6.79
Q2 98.12 0.9 6.73 27.13 4.59
Q3 123.04 1.7 7.29 27.66 4.96
Q4 6.92 0.27 0.99 2.34 0.75
Q5 13.61 0.5 2.92 7.35 NA
Q6 123.84 12.5 21.83 78.42 NA
Q7 187.22 27.9 8.59 106.37 NA
Q8 (1y) 8.75 0.55 1.74 6.76 8.13
Q8 (2y) 102.17 1.1 3.68 28.82 16.54
Q8 (3y) 104.7 1.69 5.44 35.37 24.46
Q8 (4y) 107.05 2.12 7.22 41.66 32.49
Q8 (10y) 119.54 29.14 17.42 72.67 70.35
Q9 69.05 6.3 0.31 76.12 9.54

The all results are available in summary Spreadsheet

I especially do not put TokuDB in the same table with analytic oriented databases, to highlight TokuDB is OLTP engine for general purposes.
As you see it is doing better than MyISAM in all queries.


Entry posted by Vadim | 18 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/11/05/air-traffic-queries-in-myisam-and-tokutek-tokudb/feed/ 18
New developers training course is almost ready http://www.mysqlperformanceblog.com/2009/11/05/new-developers-training-course-is-almost-ready/ http://www.mysqlperformanceblog.com/2009/11/05/new-developers-training-course-is-almost-ready/#comments Fri, 06 Nov 2009 01:56:25 +0000 Morgan Tocker http://www.mysqlperformanceblog.com/?p=1630 We've been busy expanding our training curriculum to include training for developers building applications with MySQL.  We have reached the point where we're ready for a pilot teach - and it brings me great pleasure to announce that we're opening it up for blog readers to attend, free of charge.

The details:
San Francisco
4th December
9:30AM - 5PM

Spaces are limited, so to give everyone a fair chance we're delaying registration to open at noon tomorrow (Friday) Pacific Time. It's strictly first in first served, so be quick!  The registration link is here.


Entry posted by Morgan Tocker | One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/11/05/new-developers-training-course-is-almost-ready/feed/ 1
InnoDB: look after fragmentation http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/ http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/#comments Thu, 05 Nov 2009 19:01:54 +0000 Vadim http://www.mysqlperformanceblog.com/?p=1616 One problem made me puzzled for couple hours, but it was really interesting to figure out what's going on.

So let me introduce problem at first. The table is

CODE:
  1. CREATE TABLE `c` (
  2.   `tracker_id` int(10) unsigned NOT NULL,
  3.   `username` char(20) character set latin1 collate latin1_bin NOT NULL,
  4.   `time_id` date NOT NULL,
  5.   `block_id` int(10) unsigned default NULL,
  6.   PRIMARY KEY  (`tracker_id`,`username`,`time_id`),
  7.   KEY `block_id` (`block_id`)
  8. ) ENGINE=InnoDB

Table has 11864696 rows and takes Data_length: 698,351,616 bytes on disk

The problem is that after restoring table from mysqldump, the query that scans data by primary key was slow. How slow ? Let me show.

The query in question is (Q1):

SELECT count(distinct username) FROM tracker where TIME_ID >= '2009-07-20 00:00:00' AND TIME_ID <= '2009-10-21 00:00:00' AND (tracker_id=437)

On cold buffer_pool, it took:

CODE:
  1. +---------------------------+
  2. | count(distinct username) |
  3. +---------------------------+
  4. |                   5856156 |
  5. +---------------------------+
  6. 1 row in set (4 min 13.61 sec)

However the query (again on cold buffer_pool) (Q2)

SELECT count(distinct username) FROM tracker where TIME_ID >= '2009-07-20 00:00:00' AND TIME_ID <= '2009-10-21 00:00:00'

CODE:
  1. +---------------------------+
  2. | count(distinct username) |
  3. +---------------------------+
  4. |                   5903053 |
  5. +---------------------------+
  6. 1 row in set (18.81 sec)

Difference is impressive. 4 min 13.61 sec vs 18.81 sec

If you want EXPLAIN plain, here it is:

For Q1:

CODE:
  1. +----+-------------+-------------------------+------+---------------+---------+---------+-------+---------+--------------------------+
  2. | id | select_type | table                   | type | possible_keys | key     | key_len | ref   | rows    | Extra                    |
  3. +----+-------------+-------------------------+------+---------------+---------+---------+-------+---------+--------------------------+
  4. 1 | SIMPLE      | tracker  | ref  | PRIMARY       | PRIMARY | 4       | const | 6880241 | Using where; Using index |
  5. +----+-------------+-------------------------+------+---------------+---------+---------+-------+---------+--------------------------+
  6. 1 row in set (0.02 sec)

For Q2:

CODE:
  1. +----+-------------+-------------------------+-------+---------------+-------------------------------------+---------+------+----------+--------------------------+
  2. | id | select_type | table                   | type  | possible_keys | key                                 | key_len | ref  | rows     | Extra                    |
  3. +----+-------------+-------------------------+-------+---------------+-------------------------------------+---------+------+----------+--------------------------+
  4. 1 | SIMPLE      | tracker | index | NULL          | block_id | 5       | NULL | 13760483 | Using where; Using index |
  5. +----+-------------+-------------------------+-------+---------------+-------------------------------------+---------+------+----------+--------------------------+

Query Q1 is executed using Primary Key, and Query Q2 is using block_id key.

To get more details I ran both queries with our extended stats in slow.log (available in 5.0-percona releases)

So for query Q1:

CODE:
  1. # Query_time: 253.643162  Lock_time: 0.000137  Rows_sent: 1  Rows_examined: 11569733  Rows_affected: 0  Rows_read: 11569733
  2. #   InnoDB_IO_r_ops: 73916  InnoDB_IO_r_bytes: 1211039744  InnoDB_IO_r_wait: 236.149003
  3. #   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
  4. #   InnoDB_pages_distinct: 54838

And for query Q2:

CODE:
  1. # Query_time: 18.846855  Lock_time: 0.000123  Rows_sent: 1  Rows_examined: 11864696  Rows_affected: 0  Rows_read: 11864696
  2. #   InnoDB_IO_r_ops: 27510  InnoDB_IO_r_bytes: 450723840  InnoDB_IO_r_wait: 0.165124
  3. #   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
  4. #   InnoDB_pages_distinct: 24687

As you see for Q1 IO read took 236.149003 sec vs 0.165124 for Q2. But Q1 is scan by primary key, which supposed to be
sequential!

Let's see on another statistic, which available in innodb_check_fragmentation patch:

for Q1:

CODE:
  1. SHOW STATUS LIKE 'Innodb_scan_pages%';
  2. +------------------------------+-------+
  3. | Variable_name                | Value |
  4. +------------------------------+-------+
  5. | Innodb_scan_pages_contiguous | 88    |
  6. | Innodb_scan_pages_jumpy      | 73789 |
  7. +------------------------------+-------+
  8. 2 rows in set (0.00 sec)

and for Q2:

CODE:
  1. mysql> SHOW STATUS LIKE 'Innodb_scan_pages%';       
  2. +------------------------------+-------+
  3. | Variable_name                | Value |
  4. +------------------------------+-------+
  5. | Innodb_scan_pages_contiguous | 26959 |
  6. | Innodb_scan_pages_jumpy      | 442   |
  7. +------------------------------+-------+
  8. 2 rows in set (0.00 sec)

So you see for Q1 it was not sequential scan, even it is primary key, but it is sequential for Q2.

So what's the answer ? It's fragmentation of primary key (and whole data table, as InnoDB data == primary key). But how it could happen with
primary key after mysqldump ? The answer here if we look on

EXPLAIN SELECT * FROM tracker;

CODE:
  1. +----+-------------+-------------------------+-------+---------------+-------------------------------------+---------+------+----------+-------------+
  2. | id | select_type | table                   | type  | possible_keys | key                                 | key_len | ref  | rows     | Extra       |
  3. +----+-------------+-------------------------+-------+---------------+-------------------------------------+---------+------+----------+-------------+
  4. 1 | SIMPLE      | tracker | index | NULL          | block_id | 5       | NULL | 13760483 | Using index |
  5. +----+-------------+-------------------------+-------+---------------+-------------------------------------+---------+------+----------+-------------+
  6. 1 row in set (0.00 sec)

We see that dump is taken in key "block_id" order, not in primary key order. And later when we load this table, INSERTS into primary key happens in random order, and that gives us the fragmentation we see here.

How to fix it in our case. It's easy: ALTER TABLE tracker ENGINE=InnoDB, it will force InnoDB to rebuild table in primary key order.

After that Q1:

CODE:
  1. +---------------------------+
  2. | count(distinct username) |
  3. +---------------------------+
  4. |                   5856156 |
  5. +---------------------------+
  6. 1 row in set (17.72 sec)
  7.  
  8. mysql> SHOW STATUS LIKE 'Innodb_scan_pages%';
  9. +------------------------------+-------+
  10. | Variable_name                | Value |
  11. +------------------------------+-------+
  12. | Innodb_scan_pages_contiguous | 37864 |
  13. | Innodb_scan_pages_jumpy      | 574   |
  14. +------------------------------+-------+
  15. 2 rows in set (0.00 sec)
  16.  
  17. and extended stats:
  18. # Query_time: 17.765369  Lock_time: 0.000137  Rows_sent: 1  Rows_examined: 11569733  Rows_affected: 0  Rows_read: 11569733
  19. #   InnoDB_IO_r_ops: 38530  InnoDB_IO_r_bytes: 631275520  InnoDB_IO_r_wait: 0.204893
  20. #   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
  21. #   InnoDB_pages_distinct: 35584

You see that time returned to appropriate 17.72 sec.

You may ask what happens now with Q2 ? yes, it's getting slow now, as we made key "block_id" inserted not in order.

CODE:
  1. +---------------------------+
  2. | count(distinct username) |
  3. +---------------------------+
  4. |                   5903053 |
  5. +---------------------------+
  6. 1 row in set (2 min 8.92 sec)
  7. mysql> SHOW STATUS LIKE 'Innodb_scan_pages%';
  8. +------------------------------+-------+
  9. | Variable_name                | Value |
  10. +------------------------------+-------+
  11. | Innodb_scan_pages_contiguous | 45    |
  12. | Innodb_scan_pages_jumpy      | 35904 |
  13. +------------------------------+-------+
  14. 2 rows in set (0.00 sec)

As for mysqldump you may use --order-by-primary options to force dump in primary key order.

So notes to highlight:

  • InnoDB fragmentation may hurt your query significantly, especially when data is not in buffer_pool and execution goes to read from disk
  • Fragmentation by secondary key is much more likely than by primary key, and you cannot really control it (tough it is possible in XtraDB / InnoDB-plugin with FAST INDEX creation) so be careful with queries scan many records by secondary key
  • To check if you query affected by fragmentation you can use Innodb_scan_pages_contiguous ; Innodb_scan_pages_jumpy statistics in 5.0-percona releases (coming to 5.1-XtraDB soon)

Entry posted by Vadim | 7 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/11/05/innodb-look-after-fragmentation/feed/ 7
Air traffic queries in InfiniDB: early alpha http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/ http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/#comments Mon, 02 Nov 2009 21:29:28 +0000 Vadim http://www.mysqlperformanceblog.com/?p=1593 As Calpont announced availability of InfiniDB I surely couldn't miss a chance to compare it with previously tested databases in the same environment.
See my previous posts on this topic:
Analyzing air traffic performance with InfoBright and MonetDB
Air traffic queries in LucidDB

I could not run all queries against InfiniDB and I met some hiccups during my experiment, so it was less plain experience than with other databases.

So let's go by the same steps:

Load data

InfiniDB supports MySQL's LOAD DATA statement and it's own colxml / cpimport utilities. As LOAD DATA is more familiar for me, I started with that, however after issuing LOAD DATA on 180MB file ( for 1989 year, 1st month) very soon it caused extensive swapping (my box has 4GB of RAM) and statement failed with
ERROR 1 (HY000) at line 1: CAL0001: Insert Failed: St9bad_alloc

Alright, colxml / cpimport was more successful, however it has less flexibility in syntax than LOAD DATA, so I had to transform the input files into a format that cpimport could understand.

Total load time was 9747 sec or 2.7h (not counting time spent on files transformation)

I put summary data into on load data time, datasize and query time to Google Spreadsheet so you can easy compare with previous results. There are different sheets for queries, datasize and time of load.

Datasize

Size of database after loading is another confusing point. InfiniDB data directory has complex structure like

CODE:
  1. ./000.dir/000.dir/003.dir/233.dir
  2. ./000.dir/000.dir/003.dir/233.dir/000.dir
  3. ./000.dir/000.dir/003.dir/233.dir/000.dir/FILE000.cdf
  4. ./000.dir/000.dir/003.dir/241.dir
  5. ./000.dir/000.dir/003.dir/241.dir/000.dir
  6. ./000.dir/000.dir/003.dir/241.dir/000.dir/FILE000.cdf
  7. ./000.dir/000.dir/003.dir/238.dir
  8. ./000.dir/000.dir/003.dir/238.dir/000.dir
  9. ./000.dir/000.dir/003.dir/238.dir/000.dir/FILE000.cdf
  10. ./000.dir/000.dir/003.dir/235.dir
  11. ./000.dir/000.dir/003.dir/235.dir/000.dir
  12. ./000.dir/000.dir/003.dir/235.dir/000.dir/FILE000.cdf

so it's hard to day what files are related to table. But after load, the size of 000.dir is 114G, which is as twice big as original data files. SHOW TABLE STATUS does not really help there, it shows

CODE:
  1. Name: ontime
  2.          Engine: InfiniDB
  3.         Version: 10
  4.      Row_format: Dynamic
  5.            Rows: 2000
  6.  Avg_row_length: 0
  7.     Data_length: 0
  8. Max_data_length: 0
  9.    Index_length: 0
  10.       Data_free: 0
  11.  Auto_increment: NULL
  12.     Create_time: NULL
  13.     Update_time: NULL
  14.      Check_time: NULL
  15.       Collation: latin1_swedish_ci
  16.        Checksum: NULL
  17.  Create_options:
  18.         Comment:

with totally misleading information.

So I put 114GB as size of data after load, until someone points me how to get real size, and also explains what takes so much space.

Queries

First count start query SELECT count(*) FROM ontime took 2.67 sec, which shows that InfiniDB does not store counter of records, however calculates it pretty fast.

Q0:
select avg(c1) from (select year,month,count(*) as c1 from ontime group by YEAR,month) t;

Another bumper, on this query InfiniDB complains

ERROR 138 (HY000):
The query includes syntax that is not supported by InfiniDB. Use 'show warnings;' to get more information. Review the Calpont InfiniDB Syntax guide for additional information on supported distributed syntax or consider changing the InfiniDB Operating Mode (infinidb_vtable_mode).
mysql> show warnings;
+-------+------+------------------------------------------------------------+
| Level | Code | Message |
+-------+------+------------------------------------------------------------+
| Error | 9999 | Subselect in From clause is not supported in this release. |
+-------+------+------------------------------------------------------------+

Ok, so InfiniDB does not support DERIVED TABLES, which is big limitation from my point of view.
As workaround I tried to create temporary table, but got another error:

CODE:
  1. mysql> create temporary table tq2 as (select Year,Month,count(*) as c1 from ontime group by Year, Month);
  2. ERROR 122 (HY000): Cannot open table handle for ontime.

As warning suggests I turned infinidb_vtable_mode = 2, which is:

CODE:
  1. 2) auto-switch mode: InfiniDB will attempt to process the query internally, if it
  2. cannot, it will automatically switch the query to run in row-by-row mode.

but query took 667 sec :

so I skip queries Q5, Q6, Q7 from consideration, which are also based on DERIVED TABLES, as not supported by InfiniDB.

Other queries: (again look on comparison with other engines in Google Spreadsheet or in summary table at the bottom)

Query Q1:
mysql> SELECT DayOfWeek, count(*) AS c FROM ontime WHERE Year BETWEEN 2000 AND 2008 GROUP BY DayOfWeek ORDER BY c DESC;
7 rows in set (6.79 sec)

Query Q2:
mysql> SELECT DayOfWeek, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year BETWEEN 2000 AND 2008 GROUP BY DayOfWeek ORDER BY c DESC;

7 rows in set (4.59 sec)

Query Q3:
SELECT Origin, count(*) AS c FROM ontime WHERE DepDelay>10 AND Year BETWEEN 2000 AND 2008 GROUP BY Origin ORDER BY c DESC LIMIT 10;

4.96 sec

Query Q4:
mysql> SELECT Carrier, count(*) FROM ontime WHERE DepDelay > 10 AND YearD=2007 GROUP BY Carrier ORDER BY 2 DESC;

I had another surprise with query, after 15 min it did not return results, I check system and it was totally idle, but query stuck. I killed query, restarted mysqld but could not connect to mysqld anymore. In processes I see that InfiniDB started couple external processes: ExeMgr, DDLProc, PrimProc, controllernode fg, workernode DBRM_Worker1 fg which cooperate each with other using IPC shared memory and semaphores. To clean system I rebooted server, and only after that mysqld was able to start.

After that query Q4 took 0.75 sec

Queries Q5-Q7 skipped.

Query Q8:

SELECT DestCityName, COUNT( DISTINCT OriginCityName) FROM ontime WHERE YearD BETWEEN 2008 and 2008 GROUP BY DestCityName ORDER BY 2 DESC LIMIT 10;

And times for InfiniDB:

1y: 8.13 sec
2y: 16.54 sec
3y: 24.46 sec
4y: 32.49 sec
10y: 1 min 10.35 sec

Query Q9:

Q9:
select Year ,count(*) as c1 from ontime group by Year;

Time: 9.54 sec

Ok, so there is summary table with queries times (in sec, less is better)

Query MonetDB InfoBright LucidDB InfiniDB
Q0 29.9 4.19 103.21 NA
Q1 7.9 12.13 49.17 6.79
Q2 0.9 6.73 27.13 4.59
Q3 1.7 7.29 27.66 4.96
Q4 0.27 0.99 2.34 0.75
Q5 0.5 2.92 7.35 NA
Q6 12.5 21.83 78.42 NA
Q7 27.9 8.59 106.37 NA
Q8 (1y) 0.55 1.74 6.76 8.13
Q8 (2y) 1.1 3.68 28.82 16.54
Q8 (3y) 1.69 5.44 35.37 24.46
Q8 (4y) 2.12 7.22 41.66 32.49
Q8 (10y) 29.14 17.42 72.67 70.35
Q9 6.3 0.31 76.12 9.54

Conclusions

  • InfiniDB server version shows Server version: 5.1.39-community InfiniDB Community Edition 0.9.4.0-5-alpha (GPL), so I consider it as alpha release, and it is doing OK for alpha. I will wait for more stable release for further tests, as it took good amount of time to deal with different glitches.
  • InfiniDB shows really good time for queries it can handle, quite often better than InfoBright.
  • Inability to handle derived tables is significant drawback for me, I hope it will be fixed

Entry posted by Vadim | 17 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/feed/ 17
Speaking at the LA MySQL Meetup – 18th November http://www.mysqlperformanceblog.com/2009/11/01/speaking-at-the-la-mysql-meetup-18th-november/ http://www.mysqlperformanceblog.com/2009/11/01/speaking-at-the-la-mysql-meetup-18th-november/#comments Sun, 01 Nov 2009 17:35:44 +0000 Morgan Tocker http://www.mysqlperformanceblog.com/?p=1478 Morgan speaking at Highload.ru
A recent photo from Highload.ru

I said in my last post, that we're interested in speaking at MySQL meetups, and I'm happy to say that the Los Angeles MySQL Meetup has taken us up on the offer.

On November 18th, I'll be giving an introductory talk on InnoDB/XtraDB Performance Optimization.  I will be the second speaker, with Carl Gelbart first speaking on Infobright.

What brings me to LA?  On the same day (18th Nov) I'll be teaching a one day class on Performance Optimization for MySQL with InnoDB and XtraDB.  If you haven't signed up yet - spaces are still available.


Entry posted by Morgan Tocker | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/11/01/speaking-at-the-la-mysql-meetup-18th-november/feed/ 0
New MariaDB release is out http://www.mysqlperformanceblog.com/2009/10/30/new-mariadb-release-is-out/ http://www.mysqlperformanceblog.com/2009/10/30/new-mariadb-release-is-out/#comments Sat, 31 Oct 2009 00:06:35 +0000 peter http://www.mysqlperformanceblog.com/?p=1577 MariaDB project kept development going in the repository only not providing any binary releases since April, so release was well over due and it is here now.

If you're wondering how this release of MariaDB is different from MySQL you should read this FAQ


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/10/30/new-mariadb-release-is-out/feed/ 0
Giving a talk in Palo Alto, November 3rd http://www.mysqlperformanceblog.com/2009/10/30/giving-a-talk-in-palo-alto-november-3rd/ http://www.mysqlperformanceblog.com/2009/10/30/giving-a-talk-in-palo-alto-november-3rd/#comments Fri, 30 Oct 2009 17:23:29 +0000 peter http://www.mysqlperformanceblog.com/?p=1574 I'm going to give a talk on Goal Driven Performance Optimization next Tuesday. This is one of my favorite talks as it goes beyond MySQL to the principles you can apply to performance optimizations of the complex systems, especially when you have to do a lot in limited time or budget and so you can't just fix everything what can be fixed.

Please RSVP if you're planning to attend as space is limited.

Thanks to Sam Ghods and Box.Net for organizing the event.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/10/30/giving-a-talk-in-palo-alto-november-3rd/feed/ 0
State of the art: Galera – synchronous replication for InnoDB http://www.mysqlperformanceblog.com/2009/10/27/state-of-the-art-galera-synchronous-replication-for-innodb/ http://www.mysqlperformanceblog.com/2009/10/27/state-of-the-art-galera-synchronous-replication-for-innodb/#comments Tue, 27 Oct 2009 15:08:58 +0000 Vadim http://www.mysqlperformanceblog.com/?p=1556 First time I heard about Galera on Percona Performance Conference 2009, Seppo Jaakola was presenting "Galera: Multi-Master Synchronous MySQL Replication Clusters". It was impressed as I personally always wanted it for InnoDB, but we had it in plans at the bottom of the list, as this is very hard to implement properly.
The idea by itself is not new, I remember synchronous replication was announced for SolidDB on MySQL UC 2007, but later the product was killed by IBM.

So long time after PPC 2009 there was available version mysql-galera-0.6, which had serious flow, to setup a new node you had to take down whole cluster. And all this time Codership ( company that develops Galera) was working on 0.7 release that introduces node propagation keeping cluster online. You can play with 0.7pre release by yourself MySQL/Galera Release 0.7pre.

In current version propagation is done by mysqldump from one of nodes ("donor"). In next release Codership is going to support LVM snapshot and xtrabackup which will make the setup of new node even easier. The current annoyance I see is that if you shutdown one node for short period of time for quick maintenance, after start, the node has to load whole mysqldump, like it is new empty node. I hope Codership guys will address this also.
Another thing I miss for now is support of InnoDB-plugin, which as we know performs much better than standard InnoDB ®.

So what is so interesting about Galera. Couple things:

- High Availability. Any of N standby nodes are available immediately when main node fails. Galera is serious pretender to be included to the list, Yves put recently, http://www.mysqlperformanceblog.com/2009/10/16/finding-your-mysql-high-availability-solution-%e2%80%93-the-questions/. I am not sure how many nines it will provide :) , but efforts on test setup and deployment should be comparable with MMM setup.

- Scale Writes. Galera allows to write to any of N nodes and automatically propagate to other nodes. It sounds too ideal, and there is drawback - with increasing amount of nodes you write to, your transaction rollback rate may increase, especially if you working on the same dataset. You can find some results on Codership's page, and I am going to run my own benchmarks also. Also from benchmark you can see that communication overhead maybe significant for short writes.

- Scale Reads. It can be done with regular replication, but with synchronous your "slaves-nodes" are in the same state, there is no "slave behind". When you read from any slave, you read actual data. Although it also has serious drawback - our cluster is fast as fast the "weakest" node in the chain. So if one node gets overloaded and performance degrades, the same happens with whole cluster.

- Heterogeneous-database replication. It is not here yet, and I do not know what's in Codership roadmap, but group manager protocol in Galera is database independent, and it's only matter of database drivers. For InnoDB currently it is set of patches, and I see it is quite possible to make the same for Postgres. So MySQL-Postgres cluster setup is not so far ahead :)

On "Company page" Codership says their goal is "to promote and exploit the latest developments in computer science to produce fast and scalable synchronous replication solution that "just works" for databases and similar applications", which I think they have success in. Implementing fast, scalable and working group communication and transaction manager is the art.

As for now I would not put 0.7 release into production yet, but you may seriously consider to play with it in test environment, and report bugs to Codership team, they are very responsive.
I am waiting for next releases and looking to make integration with XtraDB.


Entry posted by Vadim | 6 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/10/27/state-of-the-art-galera-synchronous-replication-for-innodb/feed/ 6
Air traffic queries in LucidDB http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/ http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/#comments Mon, 26 Oct 2009 17:10:31 +0000 Vadim http://www.mysqlperformanceblog.com/?p=1537 After my first post Analyzing air traffic performance with InfoBright and MonetDB where I was not able to finish task with LucidDB, John Sichi contacted me with help to setup. You can see instruction how to load data on LucidDB Wiki page

You can find the description of benchmark in original post, there I will show number I have for LucidDB vs previous systems.

Load time
To load data into LucidDB in single thread, it took for me 15273 sec or 4.24h. In difference with other systems LucidDB support multi-threaded load, with concurrency 2 (as I have only 2 cores on that box), the load time is 9955 sec or 2.76h. For comparison
for InforBright load time is 2.45h and for MonetDB it is 2.6h

DataSize
Another interesting metric is datasize after load. In LucidDB db file after load takes 9.3GB.
UPDATE 27-Oct-2009 From metadata table the actual size of data is 4.5GB, the 9.3GB is size of physical file db.dat, which probably was not truncated after several loads of data.

For InfoBright it is 1.6GB, and for MonetDB - 65GB. Obviously LucidDB uses some compression, but it is not so aggressive as in InfoBright case. As original dataset is 55GB, compression rate for LucidDB is somewhat 1:12

Queries time

Let me put list of queries and times for all systems.

- Lame query "count start"
LucidDB:
SELECT count(*) FROM otp."ontime";
1 row selected (55.165 seconds)

Both InfoBright and MonetDB returned result immediately.
It seems LucidDB has to scan whole table to get result.

- Q0:
select avg(c1) from (select "Year","Month",count(*) as c1 from otp."ontime" group by "Year","Month") t;
LucidDB: 103.205 seconds
InfoBright: 4.19 sec
MonetDB: 29.9 sec

- Q1:
SELECT "DayOfWeek", count(*) AS c FROM OTP."ontime" WHERE "Year" BETWEEN 2000 AND 2008 GROUP BY "DayOfWeek" ORDER BY c DESC;
LucidDB: 49.17 seconds
InfoBright: 12.13 sec
MonetDB: 7.9 sec

- Q2:
SELECT "DayOfWeek", count(*) AS c FROM otp."ontime" WHERE "DepDelay">10 AND "Year" BETWEEN 2000 AND 2008 GROUP BY "DayOfWeek" ORDER BY c DESC;
LucidDB: 27.131 seconds
InfoBright: 6.37 sec
MonetDB: 0.9 sec

- Q3:
!set rowlimit 10
SELECT "Origin", count(*) AS c FROM otp."ontime" WHERE "DepDelay">10 AND "Year" BETWEEN 2000 AND 2008 GROUP BY "Origin" ORDER BY c DESC;
LucidDB: 27.664 seconds
InfoBright: 7.29 sec
MonetDB: 1.7 sec

- Q4:
SELECT "Carrier", count(*) FROM otp."ontime" WHERE "DepDelay">10 AND "Year"=2007 GROUP BY "Carrier" ORDER BY 2 DESC;
LucidDB: 2.338 seconds
InfoBright: 0.99 sec
MonetDB: 0.27 sec

- Q5:
SELECT t."Carrier", c, c2, c*1000/c2 as c3 FROM (SELECT "Carrier", count(*) AS c FROM OTP."ontime" WHERE "DepDelay">10 AND "Year"=2007 GROUP BY "Carrier") t JOIN (SELECT "Carrier", count(*) AS c2 FROM OTP."ontime" WHERE "Year"=2007 GROUP BY "Carrier") t2 ON (t."Carrier"=t2."Carrier") ORDER BY c3 DESC;
LucidDB: 7.351 seconds
InfoBright: 2.92 sec
MonetDB: 0.5 sec

- Q6:
SELECT t."Carrier", c, c2, c*1000/c2 as c3 FROM (SELECT "Carrier", count(*) AS c FROM OTP."ontime" WHERE "DepDelay">10 AND "Year" BETWEEN 2000 AND 2008 GROUP BY "Carrier") t JOIN (SELECT "Carrier", count(*) AS c2 FROM OTP."ontime" WHERE "Year" BETWEEN 2000 AND 2008 GROUP BY "Carrier") t2 ON (t."Carrier"=t2."Carrier") ORDER BY c3 DESC;
LucidDB: 78.423 seconds
InfoBright: 21.83 sec
MonetDB: 12.5 sec

- Q7:
SELECT t."Year", c1/c2 FROM (select "Year", count(*)*1000 as c1 from OTP."ontime" WHERE "DepDelay">10 GROUP BY "Year") t JOIN (select "Year", count(*) as c2 from OTP."ontime" GROUP BY "Year") t2 ON (t."Year"=t2."Year");
LucidDB: 106.374 seconds
InfoBright: 8.59 sec
MonetDB: 27.9 sec

- Q8:
SELECT "DestCityName", COUNT( DISTINCT "OriginCityName") FROM "ontime" WHERE "Year" BETWEEN 2008 and 2008 GROUP BY "DestCityName" ORDER BY 2 DESC;

Years, LucidDB, InfoBright, MonetDB
1y, 6.76s, 1.74s, 0.55s
2y, 28.82s, 3.68s, 1.10s
3y, 35.37s, 5.44s, 1.69s
4y, 41.66s, 7.22s, 2.12s
10y, 72.67s, 17.42s, 29.14s

- Q9:
select "Year" ,count(*) as c1 from "ontime" group by "Year";
LucidDB: 76.121 seconds
InfoBright: 0.31 sec
MonetDB: 6.3 sec

As you see LucidDB is not showing best results. However on good side about LucidDB I can mention it is very reach featured, with full support of DML statement. ETL features is also very impressive, you can extract, filter, transform external data (there is even access to MySQL via JDBC driver) just in SQL queries (compare with single LOAD DATA statement in InfoBright ICE edition). Also I am not so much in Java, but as I understood LucidDB can be easily integrated with Java applications, which is important if your development is Java based.

Worth to mention that in LucidDB single query execution takes 100% of user time in single CPU, which may signal that there some low-hanging fruits for optimization. OProfile can show clear places to fix.


Entry posted by Vadim | 10 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/feed/ 10
XtraDB Amazon Image http://www.mysqlperformanceblog.com/2009/10/25/xtradb-amazon-image/ http://www.mysqlperformanceblog.com/2009/10/25/xtradb-amazon-image/#comments Mon, 26 Oct 2009 04:36:49 +0000 Aleksandr Kuzminsky http://www.mysqlperformanceblog.com/?p=1547 For those who use Amazon EC2 service and were anxious about having XtraDB ready to launch there is a good news.

We created a public AMI (Amazon Machine Image) with XtraDB release 8 installed on CentOS 5.3.

How to use it.

First make sure it is avaiable.

CODE:
  1. $ ec2-describe-images ami-4701e22e
  2. IMAGE   ami-4701e22e    xtradb/centos-5.3-x86_64.fs.manifest.xml        834362721059    available       public          x86_64  machine
  3. $

Run it. It is built for x86_64 plaform, so allowed types are m1.large, m1.xlarge and c1.xlarge

CODE:
  1. $ ec2-run-instances ami-4701e22e -t m1.large
  2. RESERVATION     r-46b3432e      834362721059    default
  3. INSTANCE        i-ecc74084      ami-4701e22e                    pending         0               m1.large        2009-10-25T18:31:06+0000        us-east-1c

Wait till the instance starts

CODE:
  1. $ ec2-describe-instances i-ecc74084
  2. RESERVATION     r-46b3432e      834362721059    default
  3. INSTANCE        i-ecc74084      ami-4701e22e    ec2-75-101-203-143.compute-1.amazonaws.com      domU-12-31-39-0A-26-22.compute-1.internal       running      0
  4. m1.large        2009-10-25T18:31:06+0000        us-east-1c

Now it is up and ready.


Entry posted by Aleksandr Kuzminsky | 5 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

]]>
http://www.mysqlperformanceblog.com/2009/10/25/xtradb-amazon-image/feed/ 5