Following up on my Previous Post I decided to do little test to see how accurate stats we can get for for Index Stats created by ANALYZE TABLE for MyISAM and Innodb.

But before we go into that I wanted to highlight about using ANALYZE TABLE in production as some people seems to be thinking I advice to use it…. a lot. In fact I should say I see more systems which have ANALYZE abused – run too frequently without much need than systems which do not run ANALYZE frequently enough.

First it is worth to note MySQL only saves very basic cardinality information for index prefixes for index stats and these rarely change. There is no histograms or any other skew metrics etc. MySQL optimizer also uses number of rows in the table for many decisions but this is computed live (maintained for MyISAM and estimated during query execution for Innodb). This basic information means it does not change whole that quickly at extent to affect optimizer plans.

If you look at the stats accuracy along running ANALYZE TABLE after initial table population and when there are significant changes makes sense. For Innodb as index stats are computed first time table is accessed after restart this often means “never” because MySQL servers are restarted frequently enough. Even once per 3 months is often enough for many workloads. Add to this Innodb stats are less accurate by nature which means you can allow more data change while your
index stats remain as good as new.

Looking at stats accuracy is however a wrong way to look at the problem. Your index stats are a bit off, so what ? What really matters is not how accurate stats are but how good plans you’re getting for your queries. If you’re getting as good plans as with perfect stats why bother updating them ?
Also note many simple “queries” (using constants for index accesses) will not use index cardinality data at all but will estimate number of rows during query execution.

I typically look at ANALYZE TABLE and adding it to the table if I see having it run helps to get good plans. If query plans are good or bad independently of it being run there is need to bother – for bad plans use FORCE INDEX or change the query and report MySQL Optimizer Bug 🙂

But now lets see in the difference of behavior of ANALYZE TABLE for MyISAM vs Innodb.

I used the following simple table for tests:

I have populated it with data with following true cardinality:

Lets see how stats look for MYISAM:

Aha as you can see there is no cardinality stored with table as ANALYZE did not run yet.

As you can see after running ANALYZE we have exact cardinality for i and c columns, with cardinality for the pair (c,i) looks a bit off but is within 0.5% of the correct value so we can count on MyISAM values as almost exact.

As you see ANALYZE table tool a little bit of time to run (even for this very small table) this is because ANALYZE does index scans to find number of exact values in the table.

Now let us populate antest_innodb table which is same but uses Innodb format:

Very interesting result – after loading the data with INSERT in Innodb table we do not get NULL cardinality as with MyISAM but instead we get very wrong cardinality which shows us index prefix is unique (245900 is estimate for the row count in the table)

It is worth to note if you do ALTER TABLE Innodb, same as MyISAM will internally run analyze as soon as table is rebuilt and values will be more sensible:

Note however how much are these values off from reality. The “i” key cardinality is overestimated 3 times, “c” key prefix cardinality is underestimated 5 times and the combined (c,i) key cardinality is overestimated 2 times. So Innodb stats are are very inexact. Fortunately for most queries which use these stats accuracy at the order of magnitude is enough. Sometimes it is not and you’re thinking why a hell it could be picking this strange plan.

Let us run ANALYZE TABLE for Innodb couple of more times to see how values change:

As we see subsequent runs change stats dramatically. For c prefix we got value changed to become 15 times larger. So Innodb stats are both inexact and unstable. So restarting server with Innodb may change stats dramatically and affect some query plans. You also may be getting different plans on different slaves with same data.

Another difference when it comes from handling the statistics comes from NULL handling.
MyISAM has a special variable which controls if NULLs should be considered equal when computing stats:

Too see the difference let me set column “c” to NULL in both tables and see how values change:

As you can see MyISAM set cardinality for prefix (c) and key(c,i) approximately to number of rows in the table treating all nulls different values. Innodb on the contrary treats all NULL values the same so
cardinality for (c) and (c,i) dropped significantly.

This means Innodb and MyISAM have different stats computation method by default.

Lets check how stats change for MyISAM if we change the stats computation method:

oops. Little gotcha. MySQL considers table up to date even though stats stored were computed with different method. If your table is written to actively you should not have this problem; I just did couple of updates to refresh update time.

So with nulls_equal method we see very different picture. It is considered we only have one distinct value for “c” and there are 101 distict values for (c,i) which is the same as value of distinct values in i column. These stats look much closer to what we get for Innodb table with same data though we can see Innodb stats are a bit off from reality too.

MySQL version note: This is from MySQL 5.0.62 if there are other versions which show different behavior.

9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Julien

Wow !

You’ve made my day ! Thanks a lot for those informations/tests.

StephaneC

So we can conclude from your tests that ANALYZE TABLE on InnoDB tables is useless. We’d better only use OPTIMIZE TABLE to defragment the tables (and this will also update index statistics as OPTIMIZE is mapped to ALTER TABLE).
Am I right or can we imagine situations where ANALYZE TABLE will be helpful with InnoDB ?

Gunnar

Hi Peter,

I fully agree with you.
I’ve faced the same problem at a customer.

When the customer switched from MyISAM to InnoDb some queries that before took > 1 sec now took several minutes.
The reason we slightly wrong stats in the innodb analyse which made the optimizer create tabsolutely wrong query plans.

All these queries could be fixed with manually hinting with STRAIGHT_JOIN or FORCE INDEX.

But of course it would have been much nicer if the stats would be correct on inno.

Did you file a bug for this already?

Dor

Hi peter,

Each one of the queries that are displayed in this page has a specific display format which includes the query itself, the result set (neatly separated with stripped lines) & the time it took to execute.

How did you do that?
I see this query-display-format in many developers websites.
It seems that you use an SQL tools for that.

Thanks in advance,
Dor.

Hi Peter,

I just happened to run across this while looking at a bug in MyISAM. I’ve not checked out the behavior in MySQL (so I could be wrong), but it looks like Innodb only updates the current table and not the global share (like MyISAM does). So analyze may be a local only modifier for the current session for Innodb. I’ve noticed that both PBXT and MyISAM update the global, though since the optimizer never takes a lock on the share (ie the table definition) there is a remote chance that they values may be @$#@#$. I don’t believe there is any later “magic” that occurs that closed the table and replaces the global values in a way to protect the variables, but I could be wrong.

If you are curious about the code look at ha_innodb.cc and search for HA_STATUS_CONST. Closing and reopening all “Handler” on the table will also cause this to be recalculated.

Cheers,
-Brian

re: locks. I just read through MySQL source. It looks like MyISAM may be acquiring additional locks which is why it is safe for it. I don’t still see where Innodb is doing anything global like MyISAM is.

Mua sam vui

Thanks much. and bytheway. can anyone tell me which is better for large database ?