I wrote one week ago about how to find duplicate indexes. This time we’ll learn how to find unused indexes to continue improving our schema and the overall performance. There are different possibilites and we’ll explore the two most common here. User Statistics from Percona Server and pt-index-usage.

User Statistics

User Statistics is an improvement on Percona Server that adds some tables to Information Schema with useful information to understand the server activity and identify the source of the load. Those tables store statistics from our users, tables, indexes and so on, information very helpful to have a clear picture of our users are doing in the database.

To accomplish our task of finding unused indexes we’re going to use INFORMATION_SCHEMA.INDEX_STATISTICS but first we should enable User Statistics:

mysql> SET GLOBAL userstat=on;

After all this hard work we have the statistics plugin gathering information. I usually enable it during load peaks to get realistic information about what is happening in the server. To generate some load on the database I’m going to use the benchmark software tpcc-mysql.

After some runs we can start to analyze the information on INDEX_STATISTICS table.

These are the index used during the benchmark. Let’s compare it with the data on INNODB_INDEX_STATS to get the list of unused indexes:

Here we see one of the limitations on User Statistics, it can only track direct lookups on index keys but not lookups done during constraint checks. That’s the reason FK are marked as unused so that means all the other indexes has been used during the benchmark.

The second limitation is with partitioned tables, you can’t get index statistics usage from those tables.

pt-index-usage

The previous example only works with Percona Server so with stock MySQL we should try another approach.

Percona Toolkit has a tool to check the usage of our index from the slow query log. The concept is easy to understand. pt-index-usage reads the slow query log and execute every query with EXPLAIN to ask MySQL which indexes would it use. At the end of the process you’ll have an output with a list of not used indexes. This is an example:

If you compare these results with the previous example, there are some differences and these are some of the reasons:

    • pt-index-usage uses EXPLAIN and User Statistics number of rows read for every index. So we’re comparing an estimation with real data.
    • EXPLAIN, as I said before, is an estimation from the optimizer and sometimes the real query can use a different execution plan.
    • pt-index-usage tries to convert non-SELECT queries to SELECT queries and is not always possible to get a perfect conversion so there can be some differences.
  • Be cautious with Unique indexes, because maybe they are not used for index lookups but your application may need them to avoid duplicates on the column.

    Take in account that depending on the size of the log the server would need lot of process time and cpu power so I recommend you to not use in the production server. Run it on a slave or a testing environment with an similar data estructure and size.

    Conclusion

    We’ve learnt how to find unused indexes using different approaches and we’ve also learnt that we should’t follow tool’s suggestion literally. These tools are here to help us and after that is our job to check and benchmark the changes on our testing environment to measure the impact on performance. You won’t need to drop all such indexes but it gives you an idea of which of them you should evaluate. Good tools to help us on that evaluation are pt-log-player and pt-upgrade.

    It’s worth to mention the importance of taking a long enough period of logs or statistics to analyze all relevant workload. In some cases there are processes that run once a day or a week and we should have that information in our analysis report.

    5 Comments
    Oldest
    Newest Most Voted
    Inline Feedbacks
    View all comments
    Vad Gor

    Fix word “estructure”, please

    Short Sorceress

    SET GLOBAL userstat=on;
    ERROR 1193 (HY000): Unknown system variable ‘userstat’
    on mysql 5.5

    Prasanth

    “The previous example only works with Percona Server so with stock MySQL we should try another approach.” Do you have MySQL with Percona server?

    Vojtech Kurka

    Updated query that works on 5.7:

    SELECT DISTINCT iis.database_name, iis.table_name, iis.index_name
    FROM mysql.innodb_index_stats iis
    LEFT JOIN information_schema.index_statistics istat ON istat.TABLE_SCHEMA = iis.database_name AND istat.TABLE_NAME = iis.table_name AND istat.INDEX_NAME = iis.index_name
    WHERE istat.TABLE_SCHEMA IS NULL
    AND iis.index_name’PRIMARY’

    sudhanshu sirohia

    For Mysql 5.6 and later
    SELECT object_schema,
    object_name,
    index_name
    FROM performance_schema.table_io_waits_summary_by_index_usage
    WHERE index_name IS NOT NULL
    AND count_star = 0
    ORDER BY object_schema, object_name;