July 28, 2014

MySQL Partitioning – can save you or kill you

I wanted for a while to write about using MySQL Partitioning for Performance Optimization and I just got a relevant customer case to illustrate it. First you need to understand how partitions work internally. Partitions are on the low level are separate table. This means when you’re doing lookup by partitioned key you will look at one (or some of) partitions, however lookups by other keys will need to perform lookup in all partitions and hence can be a lot slower. The gain from updates typically comes from having smaller BTREE on the active partition(s) which allows for a lot better fit. Having potentially fewer level in BTREE is not that significant issue.

So lets see at example:

The access pattern to this table is to lookup data by “uu” which has UUID values and when number of deletes by “id” and bunch of inserts. The deletes are mainly clustered around most recent id values.
The table (and index) is much larger than buffer pool size.

The first problem was replication lag, which are mainly due to modifying the uu index. This is because UUID() spreads values prefix very well effectively giving almost uniform access to all BTREE. To solve this problem partitioning was a good choice – PARTITION BY HASH (id div 10000000) PARTITIONS 32 – This allows to partition data to 32 partitions placing sequential ranges of 10M values in the same partition – very handy if you have very active access to values which ave been added to the table recently.

Using this trip replication could be speed up about 10 times as couple of partitions which were actively used could fit in buffer pool completely so replication became CPU bound (single thread) instead of IO bound.

You could celebrate but hey…. you need to check the impact on master too. Master in its turn was getting a lot of lookups by the uu value which is not part of partitioned key and hence we’re looking at 32 logical lookups, one per partition. True only one of the partitions would contain the value but many of them will require physical IO and going down to the leaf key to verify such value does not exist, which reduced performance for random selects by UUID from 400 to 20 per second (from single thread).

Decreasing number of partitions made replication less efficient but the number of selects the table could deliver was increasing and there seems to be a reasonable number which would allow replication to perform better when it is now, while selects still performed in the amount system needs.

What is a take away ? When you’re creating partitions think clearly what you’re trying to archive. Partitioning is not some magic feature which just makes everything a lot faster. I’ve seen some people applying partition to basically all of their tables without much a thought and believe me results were not pretty.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Peter Boros says:

    Hi,

    In a case like this, I usually leave the master unpartitioned and partition only the slaves. You get the performance you want on the master and the slaves, the tradeoff is that slave cloning or rebuilding is a more time-consuming task. Is this a good practice? Why did you decide to lower the number of partitions on the master and the slaves too?

    Peter

  2. Peter,

    Right. This is the option. In the given case the purpose of the slave was basically failover and backups so it does not help to have a different schema.

  3. Ken says:

    Peter as I read your writing I get the impression that built-in partitioning is for specific, niche type applications or cases, if at all.

    Sharding on the application level seems much more wise getting high performance. You can basically get the same functionality it seems…

  4. Ken,

    Partitioning and Sharding are technologies with different focus. Partitioning is database side technology which allows in some cases to get query execution more efficient. It is transparent to application which is great. Sharding is application level technology where you change how application does queries to aggregate data from different tables and servers. If you use Sharding in most cases you do not use partitioning, but sometimes it is helpful still – you may shard on “user” and when use partitioning by date/time to separate new data from old.

  5. Andy says:

    So when the table is partitioned by its primary key (id), the secondary index BTREE (uu) is also broken into 32 pieces?

    I was under the impression that when a table is partitioned by PK, secondary index BTREE would not be broken up. I guess that was a misconception?

  6. Andy,

    You touch on very interesting topic. In MySQL partitioning is kind of lame because it is not really designed from scratch but rather added on top simply being more advanced form of “merge table” having multiple table underneath. This means MySQL can only deal with local indexes (each partition has its own index) – there is no support for global indexes; you also can’t partition indexes differently than you partition data (which can make sense in some cases). To have these however you need to have interface to indexes as a separate object, which is not the case inside MySQL – indexes are something which belongs to the table and managed as part of the table by storage engine.

    There are some good news though. for some workloads you can get an effect similar to global indexes by doing them as a separate index organized table.

  7. Andy says:

    Peter,

    “for some workloads you can get an effect similar to global indexes by doing them as a separate index organized table.”

    Could you give an example of that? I’m interested in learning more details of it.

  8. Well, Look at the table in the post. It is partitioned by PRIMARY KEY. Lets say you are also planning to do some lookups by timestamp and you do not want that timestamp to hit all partitions, when you can create the table timestamp_index with primary key(ts,id) and same columns and when you need to do the lookup by timestamp you would lookup in that table and join it to original table.

    does it make sense ?

  9. Rick James says:

    A UUID field should be
    BINARY(36)
    for the following reasons…
    * Not VAR because the length is constant
    * Not (255) because some (which?) queries will unnecessarily use 255 in temp tables.
    * BINARY, not CHAR because utf8 (you are Internationalized, aren’t you?!) takes 3 bytes. Alternatively, you could use CHAR(36) CHARACTER SET ASCII, if you really need case folding on the hex characters.

  10. Rick,

    You’re right of course for UUID in general. In this case the data type is designed to store “unique name” where this given installation stores UUID out there. In general BINARY(36) is good if you do not mix case for your UUID or CHAR(36) latin1 if you do. If you can store it in UNHEX BINARY(16) would be best

  11. Stephen Brooks says:

    Hi, I’m currently looking into an issue not to dissimilar to this but not found specific answers to my question and hoping this thread is a good place to put it.

    I have a database with 1 table which is MyISAM and has 30 partitions. The table has 3 columns, which are all varbinary of lengths (8,20,65500) the first 2, (A,B), make up the primary key and are used for the partitioning key as KEY(A, B). The table contains 60M rows and expected to reach 100M+, key buffer is 8G and 56% used.

    You mention in this article about reduction of partitions improving selection speed (replication is of no concern in this case) but would you expect the second key in the partition to increase performance, as I believe this is why it was added as oppose to ensuring uniqueness?

    Not all queries use both A and B, all use A.

  12. Todd McDaniel says:

    To whom it may concern,

    I have been a software developer and database administrator/developer for decades, focusing mostly on VB and SQL Server. I have worked with many companies, both large and small, but my experience has been mostly with low transaction volumes.

    I have a client that has a need for a low data volume (transaction size less than 2k) yet very high transaction volume (300k per second; although if broken down by location this could be as low as 10k transaction per second) application and they want to go cheap.

    By “going cheap” I mean they want to start by going with a hosting service using commodity hardware and MySQL with the client side PHP.

    My client has a great business concept (one of those “wish I would have thought of that!” things) and plan to start small but once “live” they will grow exponentially very quickly.

    I know sql server. I need to learn MySQL, both administration as well as development. What I need to know from you is some direction in what training I need to take. Please be detailed in what you think the training I need is. Any advice would be appreciated.

    Thanks,

    Todd

  13. ankit says:

    Hi , i have a 10 GB MYISAM table , which is growing rapidly . Can anyone assist me in the below :
    I am trying to implement partitioning by range (based on primary key) . I tried some benchmarking , but the results were more or less the same .
    I am unable to understand that whether it is due to the reason that SELECTs on primary key ar also very fast , so partitioning won’t be of much help OR i have missed something while benchmarking .
    I will be really thankfull if somebody can suggest me to go with Partitioining or not .
    My table has a high number of Selects as compared to Inserts/Updates and Replication is in place and working fine

  14. Peter Colclough says:

    Hi Peter,
    I have a couple of tables on a high transaction site (4000 queries per second), that are in excess of 50m rows. I am trying to get a suitable partitioning schema together.
    The idea was/is to partition on date ( To_day(date field)), so we can access the latest quickly , and easily archive off earlier records by copying and dropping a partition . Unfortunately the tables have a PK of char , which doesn’t fall in line with the date syntax . Adding in the date field as the primary key would negate the usefulness of the current Pk column.

    Although with 5.5 we can use multiple columns , I don’t think that helps .

    Any ideas greatly appreciated. I saw your idea of a lookup table , which could work, but would require more space, and changes in a lot if code.

  15. Rachel says:

    Hi Peter & everyone,
    I’m working on a multi-tenant saas project using MySQL as the DBMS, but it’s really hard to decide on the database structure. At first, I though shared database with multi-schema (or views in mysql) would be a great solution, because it requires minimal handling in code without commingling tenants’ data.
    For those who might be interested, this article explains the approach with example:
    http://blog.empowercampaigns.com/post/1044240481/multi-tenant-data-and-mysql

    However, after reading Peter’s article on the performance issue of mysql view (http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker), I ran a little test to compare query performance between multi-schema (by querying from a view that shows only data of one tenant) & single schema (by using the same base table from which the view is generated, but WHERE tenant_id = XXX is added to retrieve tenant data), and my finding shows that single-schema is approximately 2x faster than multi-schema. Is it because MySQL has to generate the view everytime before processing a query?

    But still, I prefer not to handle tenant data separation in code if possible, although some say that with careful design, this is still pretty safe. So now I’m wondering, how does MySQL partitioning differ from view? And is MySQL partition pruning a decent implementation of multi-schema approach?

  16. Hi Rachel,

    Firstly ‘Partitoning’ and ‘Views’ are completely different terms, and not related. A ‘View’ is simply a stored piece of Sql that produces the same columns from a table, or group of tables. ‘Partitioning’ is a way in which the DB (MySql in this case), splits its actual data down into separate tables, but are treated from the Sql POV as a single table.

    It would be relatively simple to partition your tables on ‘tenant’, so that Mysql would only ever have to look at a given partition. This would be quicker on queries, as it wouldn’t have to trawl through all the other tenants records. You would still need to store the ‘tenant’ id in your copde, or as a Mysql variable.
    If you went this route, you could still use views to simpl;ify your code, and ensure that the tenant id is used by the view, and stored for each session.

    The difference in time between querying through a view, and querying directly, is that the view will have its own execution plan, which then executes the same single plan that you did directly (if you see what I mean). basically a View generates two queries, one for itslef, and one for teh query in contains.

    Hope this helps and doesn’t confuse

    Peter C

  17. Ashish Behl says:

    Hi Peter,
    Nice article.

    I am having a similar problem wherein I have around 1billion rows in a MYISAM table, with a primary key consisting of 3 columns (ObjInstance (10000 types), objectClass (100 types) and timestamp of addition). All the 3 columns are also indexed as these (mostly seperately or in a combination of 2) are used for querying.

    Inserts are @ 800 inserts per second. Deletes are for records older than 60 days..
    We tried both
    a) partitions on objectClass –> deletes were really slow to the extent of 50 deletes per sec in batch (NOT good if it does not match insert rate)
    b) MERGE tables based on dates (sharding the main table) –> Queries and inserts become really slow
    I understand that this might be because all the tables below the merge table are queried. Also, there might be locking happening when doing queries taking longer and inserts at the same time..

    I know mysql should have no problem in handling this ampunt of data but am not sure what am I doing wrong..
    So, I am wondering

  18. Rick James says:

    @Ashish — Based on what you said, this is the best thing to do…
    ObjInstance SMALLINT UNSIGNED
    objectClass TINYINT UNSIGNED
    Normalize, if necessary, to shrink the data sizes.

    RANGE partition on dates — 1 PARTITION per day. Then do a nightly DROP PARTITION and REORGANIZE PARTITION to add a new day. This will replace the DELETEs with “instantaneous” PARTITION operations.

    Multiple indexes + 800 rows/sec == you are lucky to make it work. How much RAM? RAID? SSD? What is key_buffer_size?

    Let’s see the actual CREATE TABLE, SHOW TABLE STATUS, and the SELECTs.

    > (mostly seperately or in a combination of 2)
    Note that INDEX(a) and INDEX(a,b) are redundant; the former is virtually useless.
    Note that the optimizer will rarely use two indexes in a single SELECT. Even if it would, usually a “compound” index would be better.

  19. Rick,

    I agree with you about partitions. You may need something bigger in the way of ints, like medium or bigint, but essentially you are right.

    800 rows per second is not uncommon, especially if you bulk feed them on a prepared cursor . I have achieved 2 to 3 times that figure over a 6 month period on a development system. Our live system is regularly processing 4000+ queries per second over 4 servers.

  20. Ashish Behl says:

    Thanks for your suggestions Rick..
    Shall try your ideas first thing morning.. (But I have already started evaluating InnoDb for this…)

    I Had tried partitioning based on ObjectClass but dropped the idea as deletes were awfully slow.. am testing below mentioned schema now.. Am still in the testing phase and am evaluating InnoDb as alternative because deletes are awfully slow and my partitioned table with 800 million rows crashed..

    Our servers have 8GB RAM, 8 cores 3 GHz. high speed Raid level 1 disks

    my.conf parameters –>
    key_buffer_size=512M
    table_open_cache = 512
    sort_buffer_size = 32M
    read_buffer_size = 32M
    read_rnd_buffer_size = 64M
    myisam_sort_buffer_size = 64M
    thread_cache_size = 20
    query_cache_size =0

    Here is my current schema..

    CREATE TABLE ObjectInstances_MRG_T (
    ObjInstance VARCHAR(20) NOT NULL,
    UTCTime TIMESTAMP NOT NULL ,
    Version SMALLINT(2) NOT NULL,
    TType SMALLINT(2) NOT NULL,
    Configuration INT(4) NOT NULL,
    Detail1 INT(4) NOT NULL,
    Detail2 INT(4) NOT NULL,
    Detail3 INT(4) NOT NULL,
    Detail4 TINYINT(1) NOT NULL,
    Detail5 TINYINT(1) NOT NULL,
    ObjectClass SMALLINT(2) NOT NULL,
    INDEX OOU (ObjectClass, ObjInstance, UTCTime),
    INDEX UTCTime (UTCTime),
    INDEX MSISDN (ObjInstance),
    INDEX GroupId (ObjectClass)
    )
    ENGINE=MERGE UNION (ObjectInstances_Week1_T,
    ObjectInstances_Week2_T,
    ObjectInstances_Week3_T,
    ObjectInstances_Week4_T,
    ObjectInstances_Week5_T,
    ObjectInstances_Week6_T,
    ObjectInstances_Week7_T,
    ObjectInstances_Week8_T)
    INSERT_METHOD=FIRST;

    (SEVEN more tables like this..)
    CREATE TABLE ObjectInstances_Week1_T (
    ObjInstance VARCHAR(20) NOT NULL,
    UTCTime TIMESTAMP NOT NULL ,
    Version SMALLINT(2) NOT NULL,
    TType SMALLINT(2) NOT NULL,
    Configuration INT(4) NOT NULL,
    Detail1 INT(4) NOT NULL,
    Detail2 INT(4) NOT NULL,
    Detail3 INT(4) NOT NULL,
    Detail4 TINYINT(1) NOT NULL,
    Detail5 TINYINT(1) NOT NULL,
    ObjectClass SMALLINT(2) NOT NULL,
    PRIMARY KEY (ObjectClass, ObjInstance, UTCTime),
    INDEX UTCTime (UTCTime),
    INDEX MSISDN (ObjInstance),
    INDEX GroupId (ObjectClass)
    )
    ENGINE=MyISAM;

    3 main SELECTS are ( replaced actual fields with “*” ):
    ##————————————————–
    SELECT o1.* FROM ObjectInstances_T o1
    ,(SELECT ObjInstance, MAX(UTCTime) AS UTCTime FROM ObjectInstances_T where ObjectClass in (…) GROUP BY ObjInstance) o2
    where (o1.ObjInstance = o2.ObjInstance and o1.utctime=o2.utctime);

    SELECT o1.* FROM ObjectInstances_T o1
    ,(SELECT ObjInstance, MAX(UTCTime) AS UTCTime FROM ObjectInstances_T where ObjectClass in (…) and ObjInstance in (…) GROUP BY ObjInstance) o2
    where (o1.ObjInstance = o2.ObjInstance and o1.utctime=o2.utctime);

    “SELECT * FROM ObjectInstances_T
    where ObjectClass in (…) and ObjInstance in(…) and
    UTCTime between ’2010-06-10 03:13:27′ and ’2014-06-12 08:13:27′ order by UTCTime desc limit 50000;”

    ##————————————————–

  21. Rick James says:

    (This discussion with Ashish should be on http://forums.mysql.com/list.php?24 )

    > 8GB RAM
    > key_buffer_size=512M
    I would recommend (if you stick with MyISAM) 1500M. See http://mysql.rjweb.org/doc.php/memory

    For InnoDB, innodb_buffer_pool_size = 5500M might be optimal. (Plus key_buffer_size = 20M)

    If you switch to InnoDB, keep in mind the issues raised in http://mysql.rjweb.org/doc.php/myisam2innodb , starting with
    “Disk space for InnoDB is likely to be 2-3 times as much as for MyISAM.” (I highlight this because of your “billion” rows.

    > INT(4) — the 4 says nothing. The INT will be 4 bytes, range of +/-2 billion. Possibly overkill.

    > ObjInstance VARCHAR(20) NOT NULL,
    Normalize!

    > INDEX OOU (ObjectClass, ObjInstance, UTCTime),
    > INDEX GroupId (ObjectClass)
    DROP the latter (as I mentioned earlier)

    MERGE is an antique version of PARTITION. Were you rebuilding the MERGE UNION every week? That should have been a _very_ cheap way to drop a whole _week_ at a time.

    (reformatting for readability):
    SELECT o1.*
    FROM ObjectInstances_T o1 ,

    ( SELECT ObjInstance, MAX(UTCTime) AS UTCTime
    FROM ObjectInstances_T
    where ObjectClass in (…)
    GROUP BY ObjInstance) o2
    where (o1.ObjInstance = o2.ObjInstance
    and o1.utctime=o2.utctime);

    SELECT o1.*
    FROM ObjectInstances_T o1 ,

    ( SELECT ObjInstance, MAX(UTCTime) AS UTCTime
    FROM ObjectInstances_T
    where ObjectClass in (…)
    and ObjInstance in (…)
    GROUP BY ObjInstance) o2
    where (o1.ObjInstance = o2.ObjInstance
    and o1.utctime=o2.utctime);

    SELECT *
    FROM ObjectInstances_T
    where ObjectClass in (…)
    and ObjInstance in(…)
    and UTCTime between ’2010-06-10 03:13:27′ and ’2014-06-12 08:13:27′
    order by UTCTime desc
    limit 50000;

    I don’t see any obvious optimizations to do. If you go with PARTITIONs, and partition on UTCTime, “partition pruning” will be a slight benefit with “and UTCTime between…”.
    I doubt if the first two SELECTs would benefit from _any_ form of PARTITIONing (or MERGE).

    Your first two SELECTs may benefit from designing “summary table(s)”, populating them as the data arrives, then SELECTing against them.

    The 3rd SELECT may work better in newer versions of MySQL; it seems to do a better job of IN in compound indexes. What version are you running? What does the EXPLAIN say?

    If those are the _only_ SELECTs you need, then DROP all your secondary keys; they are possibly useless. This will greatly speed up INSERTs.

    > PRIMARY KEY (ObjectClass, ObjInstance, UTCTime),
    Including a time in a PK is usually a mistake, unless the application _guarantees_ that it will not generate two rows in the same second!

    CHARACTER SET? I ask because you have a VARCHAR. Often people can get away with latin1, which has less overhead than utf8.

    Let me guess… SHOW TABLE STATUS says 50GB of data + 30GB of index (MyISAM) or 120GB + 100GB ? (The large PK will adversely impact InnoDB secondary keys.) The numbers will be roughly unchanged with MERGE or PARTITION.

  22. Rick James says:

    @Peter …
    Sure, 4K INSERTs/sec is possible in some applications. XtraDB shines over MyISAM and InnoDB. TokuDB may shine even with lots of indexes. INDEXes can be a killer of INSERT rates on huge tables. His table is much larger than his RAM, so we need to “count the disk hits”.

    Dissecting his case…
    Inserting row(s) into the data — appending to the .MYD file: essentially 0 disk hits per row inserted
    Inserting into 4 indexes — still cannot tell how “random” they are: 0-4 disk hits.
    “RAID 1″ — Let’s say 200 disk hits/second.
    Conclusion: The indexes have hot spots, and are not very “random” if he is getting 800 INSERTs/sec.
    Dropping the secondary indexes — can insert faster.

    If he switches to InnoDB, it will depend, again, on the randomness of the data.

    With either ENGINE, there is a potential benefit of PARTITION by day, even if the keys are random…
    IF all the data and indexes for one partition fits in RAM (key_buffer / buffer_pool), and IF data is inserted in (roughly) chronological order — THEN there will be a lot of activity in the cache, but no need to flush cache until the next day gets started. And, then, the flush will (mostly) be a background task. (XtraDB has a significant fix for smoothing out this operation; InnoDB periodically “stalls”.)

    In either engine, even without dropping the unused indexes, a 1-day partition will easily fit in his RAM.

    So, I stick with my suggestion of partitioning by day. But now it is a “win” for two reasons. (1) DELETE via DROP PARTITION, and (2) no I/O thrashing during the inserts. He may well be able to sustain your 4K/sec insert rate.

  23. Ashish Behl says:

    Thanks again for the analysis..

    Have created a test table with the partitions@ described. Also, modified indexes –> 1 index (ObjectClass, Timestamp), PK –> (ObjectClass, ObjectInstance, Timestamp) – Unique.
    Using InnoDb engine as the client is very finiky about his records. Cant lose them due to a crash. Though there are nightly backups of the disk, still better to use InnoDb (though disk requirements increase 2.5 times but that’s separate issue)..

    Shall fill records into the same and then test..

    I missed mentioning few main points…
    1. Had to use multiple indexes as when I tested with the above mentioned index and PK, mysql did not select any index (saw using EXPLAIN) when my queries contained only one of the columns (Strange…??)
    2. We are using mysql ndb cluster version Distrib 5.1.56-ndb-7.1.19 (to maintain consistency across the other projects in the same group ). This base mysql version (5.1.56) supports partitions, but does not support partition pruning.
    3. The records in the production (as you rightly mentioned) should be in chronological order of timestamp.
    4. The underlying disk partition is DRBD, with replication to a second standby server for failover.. (This is not yet active in my setup but I think that this might impact performance..)
    5. (Did not mention this before as this is not the right forum for the point). Tried using concurrent_inserts=2 and the inserts fired from same machine still got blocked for long running queries fired before them from the same machine.. (Do not understand the issue.. This was another reason to switch to InnoDb)

    Lastly, its been decided today by senior architects that the servers shall also run the application server along with the Database and So, RAM would be doubled and processor cores increased (new server)..

    Another point worth mentioning.. This is my first ever encounter with a database, so, request you please ignore any stupidity and guide…

  24. @ashish A quick one on indexes. Don’t always trust explain. It may not always give you the right answer. In this case it did. The optimiser will rarely, if ever, chose a multi column index. However if you tell it to, then it will :

    Select * from mytable use index (mymulticolumnindex)….
    Or
    Select * from mytable force index (…)

    These are called hints that tell the optimiser what to do.

    With the throughput on your servers I find it strange they have decided to put the application on there too. That will only increase bandwidth and disk io. I assume there will not be many users involved

  25. Ashish Behl says:

    You are right..
    The maximum number of users is 50.

  26. Rick James says:

    INDEX(a,b) will probably not be used for a SELECT that does not mention a.

    I seem many cases where INDEX (a,b) is picked by the optimizer. Sometimes it will use only a from the index; the only clue of this is Key_len in the EXPLAIN.

    EXPLAIN _usually_ reflects the query plan for the query, but it _may_ not.

    > This base mysql version (5.1.56) supports partitions, but does not support partition pruning.
    Eh? Is that a NDB deficiency? Pruning “always” works with InnoDB or MyISAM. (I quote “always” because there are cases where it is not as smart as you would like, but there are usually workarounds.)

    > This is my first ever encounter with a database, so, request you please ignore any stupidity and guide
    I pegged you as being relatively knowledgeable.

    USE/FORCE INDEX — Do these only as a last resort. As the data changes, and the query plan needs to change, these can bite you.

    DB and App on same machine — This _may_ not be a problem, especially if, say, the db is disk-intensive and the app is CPU-intensive. Do not allow swapping; that will kill MySQL performance.

    “50 users” — a useless metric without knowing what they are doing. 1000 light users might not stress the machine, or a single user might bring it to its knees.

    Is NDB involved or not? You have mentioned MyISAM and InnoDB; they have nothing to do with NDB.

  27. Ashish Behl says:

    @Rick
    So,
    I Have to modify all my queries to have at least time, class OR time, instance, class in all of them.
    Shall keep in mind your suggestions about explain and force index use.

    The mysql 5.1 Manual Section 18.4 says –>
    This section discusses an optimization known as partition pruning, which was implemented for partitioned tables in MySQL 5.1.6
    So, to test, I tried selects with the first where clause using partition column and without it. The results were same..

    Coming to the application, (I cannot describe the exact nature and app due to various reasons, which I am sure all of you will appreciate)
    The application in itself is much more complex than what I might have provided a picture of.
    It uses
    – 2 application servers, both running on 2 nodes (1 redundant). So, 4 app servers.
    – 2 Databases (Separate mysqld instances)
    a) Clustered NDB, on the two nodes mentioned above
    b) InnoDb/MyISAM on the replicated DRBD partition on these two nodes. (This is what we have been discussing — do not have this in my test setup though)

    The first application server does the job of only providing some static data from a bunch of saved files on disk.
    This does not interact with the DB at all.

    The second App server has to interact with both the DBs, storing records @ 800 rps. deletion from NDB database has to happen very frequently. Deletion from MyISAM/InnoDb would happen everyday for records older than last 60 days.

    50 users would simultaneously access the database, with the 3 queries that I mentioned before, almost all the time. So, I guess that there would be a fair amount of load on the database (1 query per user per 1 min – worst case).

    Have inserted large number of records into the new InnoDb table as mentioned before, using load infile.
    Query result ( during inserts 100 rows per sec) : Query 3 in 30 sec. Queries 1, 2 don’t return till long. Shall have summary table for this.
    Query result (No insets) – 4 secs for Query 1.
    Total Disk space used: 800 million records – 95 GBs.

  28. Ashish Behl says:

    rephrase “do not have this in my test setup though”
    to “do not have DRBD in my test setup though”

  29. Asish,
    When you say the queries ‘were the same’ are you convinced you weren’t reading from the cache on the second query? I suspect you were.

    However, judging from your description if the application, database, and the fact you have been told to run this in one machine, I have to say that I think it us time to revisit the whole architecture and setup of what you are doing.
    Personally, with more than 30 years of experience, I can see no logical reason for using 3 different flavours of MySQL in the same application. I also suspect that your deletion Methodology should be altered.
    Sorry if this sounds harsh , but if this is really your first move into database usage at this level , I would strongly suggest you get an experienced DBA on board to give you a hand.

    Good luck with your development

  30. Michael says:

    Hello,

    Just wanting to ask you, i got a MySQL table with about 1.200.000 rows, I want to partition it, but I don’t know what’s the best way. My table schema is: rowId (int PK), content (longtext), content_URL(text). I’m searching in the table by content_URL, as much as I understand I should set content_URL as “key”, but I don’t understand by what to partition because this is a text, I know how to partition by int (its simple), but by a text field? I want to speed up my query’s…

    Any ideea’s?

  31. Rick James says:

    Michael, why do you think you want to partition?

    Let’s see SHOW CREATE TABLE (without partitions), and the various queries (SELECTs, DELETEs) that you are afraid won’t perform well. If your only search is by content_URL, PARTITIONing will buy you nothing.

  32. Michael says:

    Here is the show create table:

    CREATE TABLE tags (
    id int(255) NOT NULL AUTO_INCREMENT,
    word text NOT NULL,
    url text NOT NULL,
    PRIMARY KEY (id)
    ) ENGINE=MyISAM AUTO_INCREMENT=1309205 DEFAULT CHARSET=latin1

    I have 1,263,009 total results

    Some query’s (I have only selects now) and some randoms

    SELECT word FROM tags WHERE url = ‘sample-url’
    SELECT word,url FROM tags WHERE id >= FLOOR(RAND()*( SELECT MAX(id) FROM tags))

  33. Michael says:

    Forgot to ask about the random tags, if I try this query:

    SELECT * FROM tag ORDER BY RAND() LIMIT 0,30

    The server dies for about 2-3 seconds

    SELECT * FROM tags WHERE id >= FLOOR(RAND()*( SELECT MAX(id) FROM tags)) LIMIT 0,30

    This works preatty fast but its not really random, i get 50% duplicates on each query

    Any other ideea’s what can I do about the random part?

    Will partitioning save me?

  34. Rick James says:

    Suggestions on fetching a RANDOM row:
    http://mysql.rjweb.org/doc.php/random

    SELECT … WHERE url =
    needs an index; else it will do a full table scan.
    Even INDEX(url(50)) would be better than nothing.

    PARTITION would help neither of your issues.

    INT(255) is meaningless — INT is a 32-bit number regardless of the value you put after it.
    TEXT is probably the wrong thing to use for word; consider VARCHAR(40).

  35. Michael says:

    I don’t understand this part:

    SELECT … WHERE url =
    needs an index; else it will do a full table scan.
    Even INDEX(url(50)) would be better than nothing.

    What do you mean? An example query? I need to set URL as a index?

  36. What Rick means is that without an index on url (the query you quoted was:
    SELECT word FROM tags WHERE url = ‘sample-url’)
    the db will start at row 0 (zero), and read, row by row down teh table till it finds a match. If the match is the last row in the table it will read all 1.2million rows. This is called a ‘full table scan’…..and is absolutely something you should never ever do, unless your table has 10 rows.

    If you do something like:
    create index xurl_indx on table(url(50))
    this will; create an index on the first 50 characters of your url, which would then be used in the query and would make life a lot easier for your DB

    HTH

  37. Michael says:

    Thanks guys because of you my query time is now 0.0002 sec and before 0.2334 sec. I added this blog to my bookmark, thanks!

  38. Raghu Sastry says:

    I had a similar situation where in lookups in master countered the partition effect for writes. I took an alternative approach – The partition on master was different from slave. The slave partition was made to satisfy the read queries for which the partition pruning kicked in. However on the master, without partitioning, the rate of inserts fell as the size of the table increased. So by making sure that if the size of a single partition is less than innodb buffer pool size than we got a constant through put for inserts on the master. Yes we invited maintenance overhead as the partition structures were different and had to be careful enough for upgrades and patches. Is this a good approach? any suggestions?

    Thanks!
    Raghu

  39. Rick James says:

    Michael — Are you sure the Query cache did not kick in? Try SELECT SQL_NO_CACHE (etc).

    Raghu — I’m surprised you don’t have trouble on the Slaves. Remember that every INSERT on the Master must be also done on the Slave(s). Are you using SBR or RBR? Would you share your two CREATE TABLEs so we can get a better feel for what you are saying.

  40. John says:

    Hello Peter,

    If i have a table with 20,000,000 rows around 50GB and if i made just simple querys like:

    select * from table where id = ’1549090′

    Creating a partition by key(id) (30 partitions) will increase my performance for selects and inserts?

    Thanks

  41. John,

    In theory you will see a significant increase in your Selects, as you will be querying on a smaller dataset. You shoould have a faster insert as well, as you will be adding into a smaller dataset/index. Partitions are effectively a ‘chunked’ table, which requires the DB finding out which ‘chunk’ to work on, then inserting or reading your data.

    The only area you need to be careful of is if you are issuing queries against this table, but NOT using the id column. This could cause a longer running query, as the engine needs to read through all the ‘chunks’.

    HTH

    Peter C

  42. Rick James says:

    “select * from table where id = ’1549090′” — needs only PRIMARY KEY(id). PARTITIONing would not speed up the SELECT. (Yes, I disagree with Peter C.)

  43. John, Rick… I feel a discussion coming on :-) .
    I actually gave the short answer, Rick has a point but there are generally other considerations to take into account. He also didn’t answer the ‘Insert’ part.

    1. With a standard numeric key id, set as a Primary Key, it will probably be quicker, or the same response, on a normal table as against a partitioned table.
    2. The Insert should be quicker on a partitioned table, as it will be adding to a smaller index, and a smaller ‘chunk’ od table. Depending on how often you write to the table, you may not see the difference.

    However….

    1. If you have uneven reads across the DB (most appearing in one section …latest writes etc ), it will probably be quicker in the long term to read from a partitioned table, as, depending on the size of the partitions, the whole partition can be cached, as opposed to a single table, where probably only recently read records will be cached, resulting in more disc reads.

    2. If you need to delete records from this table, those will be quicker, as they will come out of the partition, not the whole table.

    3. Insert on Duplicate Key (I dont do replace… ever)… if across partitions, will be slower, if in the same partition will be faster.

    4. If you do queries on any other column, these could easily be quicker on a partitioned table, as Mysql can fire off one thread per partition (at least I have seen this happen in a MyIsam environment on similar sized tables).

    What I am really saying is that its not only about the ‘Selects’ and Inserts, its also about what you do with the table. Removing partitions is easier and quicker than deleting a few hundred thousand rows… for example, if you need to delete.

    However, thank you Rick for pointing out my error in the original post…. it was misleading, but wasn’t meant to be…

  44. Rick James says:

    Yeah, a discussion is brewing.

    You pointed out an important use case for PARTITIONing (“uneven reads”, Re: #1); let me rephrase the use case —
    * The entire table is too big to be cached, but
    * One partition is small enough to be cached, and
    * Your accesses are to that one partition.

    In one instance, I take advantage of that — I have 10 years of data, but usually queries hit the last week or two. I have only 20 partitions; they are PARTITION BY RANGE (to_days(…)). However, the ranges are uneven — older partitions cover larger timespans. The last few partitions are one week each. So, even if the user’s ad hoc query leads to scanning the entire last two partitions, performance is not bad. (The last 2-3 partitions tend to stay in the buffer_pool.)

    If I recall correctly, any INSERT/DELETE/etc always opens all partitions, even before thinking about pruning. Seems like this is being fixed in 5.6; not sure about Percona or MariaDB.

    (Re #4) There are _no_ parallel queries (within a single connection) anywhere in any ‘free’ variant of MySQL. PARTITIONs are scanned one at a time. (Please provide specific references if you find otherwise.)

    I would argue that single-row queries (SELECT, INSERT, DELETE, UPDATE) are similar in speed between PARTITIONed or non-PARTITIONed. When PARTITIONed, first it has to find and open the required partition, then it drills down a BTree that might be one level shallower than the non-partition equivalent.

    Even single-row writes are likely to be slower in PARTITIONed tables unless you include something to facilitate pruning.

    A billion rows in a BTree is only (about) 5 levels deep. If you split that table into 100 partitions, the BTree in each partition would be about 4 levels. A “point query” in a billion-row (non-partitioned) table will, at worst, hit the disk only 5 times. (Usually 4 of the 5 are cached from previous queries.)

    (Re #2) I have rambled on about things as a preface to rebutting “delete records from this table, those will be quicker, as they will come out of the partition, not the whole table”:
    * If the DELETE is a single row based on a UNIQUE/PRIMARY key and it can prune, the delete involves (1) find partition (if partitioned), (2) drill down the BTree, (3) remove one record. PARTITIONing makes very little difference.
    * If it cannot prune, then partitioned would be slower, since it would attempt the DELETE in all partitions.
    * A ‘range’ delete would mostly follow above two points.
    * Secondary indexes are also BTrees, so they need updating (although delayed). Yes, their BTrees may be one level shallower when partitioned, hence slightly faster.
    Conclusion: Not much difference.

    Yes, DROPping an ‘old’ partition is very efficient, much more efficient than DELETEing the equivalent rows. This is, in my experience, the main use case for PARTITIONing. I go into more details (and code) here:
    http://mysql.rjweb.org/doc.php/partitionmaint

    The third use case for PARTITIONing involves 2D-like indexing. I have not found a 4th use case.

  45. Hi

    Great article and had fun reading all the responses. After implementing what was suggested I have also managed to reduce my query time greatly.

    I will bookmark this blog for future reference.

    Best regards

    Ian

  46. Hi Peter,

    When you use partitions are the columns in the partition, if using HASH|KEY method, stored redundantly in each partition? It seems to me since the values of the KEY would be uniform over the partition that this would be unnecessary and these column values could be restored during queries based on the partition they come from. Not so of RANGE partitions, etc,of course.

    Thanks,
    Colin

  47. Rick James says:

    Think of a partition as being a table in its own right. This implies that it would have the HASH/KEY value in it. (The redundancy does not waste much space, anyway.)

  48. Thanks Rick, I knew they were implemented as basically separate tables, I thought perhaps it still may be optimized out. I am not sure why it would *not* waste much space, I have some tables that have a 24 byte row length, with a 12 byte PK. I could save about 6 bytes or 25%…nothing to sneeze at! :)

  49. jayaram says:

    Hi Peter Zaitsev,

    I have a requirement that maintaining of daily transactions information of our application in tables for logging purpose and also we are having tools for analyzing the logs based on that tables .

    The problem is , we have one table let us say ‘x-table’ one day size is 2GB. we are merging the data every day by using ‘ insert into x-total-table select * from yesterday.x-table’. In this way we are merging the data. then size of x-total-table becomes very huge and merging also takes long time.

    Hence, I’m thinking about a solution to get rid these two problems 1) DataSize , 2) Merging Time, For that i would like to create day-wise tables and based on that i will like to create merge table.

    Could you please suggest me the right way.

  50. Hi jayaram,

    I’m Percona’s community manager. Thanks very much for your question. Our discussion forums are a more appropriate place for questions like the one you have. The great thing about the forums is that you can get advice from the community in addition to just Percona experts. http://www.percona.com/forums/

  51. mathias says:

    Hi,
    jayaram question is intresting, and I would love to know what would be wise?

    When I think about restoring data, I guess many tables become more handy. Fixing a SQL-Dump (because maybe of upgrades) on the console. grep a 50GB file! Good by my RAM :)

Speak Your Mind

*