Arjen posted a good note about MyISAM concurrent-insert features, though I should mention concurrent-insert can be cause of
scalablity and peformance problems on SMP boxes, especially on queries processing range-queries. The reason of problems is POSIX read-write locks, which are used to protect key_buffer from concurrent changes and called for each processed row. More info you can get from my UC2006 talk, in short on main platforms (Linux, Solaris, Windows) rw-locks have a bad implementation and too many calls cause waste of CPU in user-space (Solaris) or kernel-space (Linux).
Some results for MyISAM table, query
1 | SELECT id FROM sbtest WHERE id BETWEEN N AND N+20000 |
, id – primary key.
Boxes: Sun V40z, Solaris 10, 4 x Dual Core Opteron @ 2.2GHz (8 logical cpu), 16GB of RAM, StorEdge 3310
and Quadxeon, RedHat AS 3, 2.4.21-15.Elsmp, 4 x Intel(R) XEON(TM) MP CPU 2.00GHz, 4GB of RAM, SATA RAID 10
The results in queries per sec (more is better)
threads | Quadxeon with enabled concurrent-insert | Quadxeon –skip-concurrent-insert | Sun V40z with enabled concurrent-insert | Sun V40z –skip-concurrent-insert |
1 | 44.08 | 64.82 | 61.06 | 129.13 |
2 | 32.63 | 123.33 | 52.63 | 244.03 |
4 | 24.95 | 176.62 | 20.03 | 463.62 |
8 | 19.92 | 206.81 | 12.34 | 483.47 |
16 | 19.73 | 208.66 | 12.3 | 428.35 |
32 | 19.77 | 212.83 | 12.25 | 445.66 |
So if you are using range-queries and doing INSERT not often than disabling concurrent-insert with –skip-concurrent-insert can improve MyISAM performance.
As I said the reason of bad scalability is rw-locks and currently MySQL developers are working on CPU-depended rw-locks implementstation, this will be available in MySQL 5.1 – 5.2
Hi Vadim!
Great info. Could you by any chance give us a heads up on what the “tipping point” is where you see scalability issues with concurrent insert? Is there a specific % read/write where MyISAM starts to have trouble, or does the situation normally occur when there is a huge range condition like the one above (20000 rows)?
Cheers!
Jay
Jay,
It happens even with only SELECT queries.
The problem is pthread_rwlock_rdlock is called for each scanned rows – that is the more rows we are processing the problem
is worse.
I don’t have exactly number of rows – but I saw the same problem for 10.000 rows.
Good idea for benchmarks to check when the problem popups – I will try it soon.
How does the “concurrent-insert = 2” feature described by Arjen affect the outcome of the above benchmark ?
I did not test concurrent-insert = 2, but looking at source code – it changes SELECTs code in no way. So the results will be the same.
Which soft was used in this benchmark?
Citrin,
I usually use sysbench (http://sysbench.sf.net).
What is the negative impact to turn the concurrent_insert off? Will it make insert waits much longer? Is there any bug or issue related to turn off this default feature?
Will turning off the feature help queries which have no insert on the same table at the same time?
My system has update:Insert:Select ratio is 3:1.5:1 so the db is write more than read.
Thanks.
Jerry,
Well…
There are several points:
1. Possible negative effect – your INSERTS will be a bit longer. I don’t know any bugs with disabled concurrent insert.
2. If you use only point selects, or selects that retieve only several rows – I don’t think concurrent=off will help you much – perhaps otherwise.
3. if you use select retrieves many rows – you definitely should use concurrent=off
4. if your database is more write that read – then I again don’t think concurrent is important for you – most time
tables are in WRITE lock.
5. But better way – test it – for example you can take mysql with patched query log and analyze execution time of queries with concurrent insert = off and on.
Hi man,
i have concurrent-insert=2 set….how would the positively impact the performance? i have a torrent tracker (legal content ofcourse) with heavy php and mysql visits daily…and some spikes.
Hi,
Any one know how to run mySQL from the visual studio 2008. I tried by building the mySQL source code and it worked fine. But problem is how to run it. So if any one can provide step by step procedure its great.
OK, I’ve never mucked much with this setting, or ever actually, but the stock site I manage is about 170 gig on drive, daily does millions of inserts, updates, selects, and a few 1000 deletes. Some of the larger tables are well over 100,000,000 rows of data, that are part of the daily changes. I have noticed since more and more data is being written, changed, I’m getting slow query issues reading and writing. Do you think if I were to disabled concurrent insert this would help or worsen the select from time issues (the tables are Primary Key index ID,Date, and indexed Date, with the Select statements always:
Select from table where ID = ###### Order by Date
with sometimes
AND Date >= 20100101 and Date <= 20110101 (for ranges of between specific dates)