April 16, 2014

Knowing what pt-online-schema-change will do

pt-online-schema-change is simple to use, but internally it is complex.  Baron’s webinar about pt-online-schema-change hinted at several of the tool’s complexities.  Consequently, users often want to know before making changes what pt-online-schema-change will do when it runs.  The tool has two options to help answer this question: –dry-run and –print.

When ran with –dry-run and –print, pt-online-schema-change changes nothing but prints a lot of information about what it will do if ran with –execute instead.  Here’s an example (don’t strain to read all this output; I’ll extract the important parts as I discuss them):

Let’s go through that output part by part to see how it reveals that the tools is going to do something wrong, a bug fixed in v2.1.2.

First, the tool does create the new table, even with –dry-run:

This should not impact the server in any way; it is just a new, empty table.  This allows two things: 1) it verifies that the tool copies the table structure correctly, and 2) it allows the tool to test the –alter statement:

If the syntax of the –alter statement is invalid, the tool will die at this point.  In this example, everything is OK so far: a simple table with a new column named “foo”.  Notice that the table has a PRIMARY key and another non-unique key named “guest_language”–this will be important later.

As the avid reader of documentation or attendee of webinars knows: pt-online-schema-change creates three triggers:

As the output says: those triggers are not actually created because this is a dry run, but if it were not, then the tool would create those triggers by executing those SQL statements.

Similarly, the tool does not copy rows because this is a dry run, but if it were not, then it would use the following SQL statements to copy rows:

The SQL statements will vary highly depending on the table, the number of rows, chunking-related options, etc., but in general you can see how pt-online-schema-change will copy the table using the given SQL statements.  Herein lies a problem–can you find it?  (Hint: think about indexes.) … Notice in each SQL statement the clause:

The tool has chosen to use the guest_language index instead of the PRIMARY key.  This is a poor choice that is due to a bug fixed in v2.1.2.  Granted, the tool can copy the table using the guest_language index, else it would not have chosen it, but it is not the optimal index.

The technical details behind the creation of the SQL statements for copying the table are beyond the scope of this blog post (Baron blogged about chunking), but in general the SQL statements should appear reasonable based on knowledge of the table’s structure, available indexes, total number of rows, etc.  You should examine the statements and ask questions like:

  • Is the chosen index a good candidate?  In this example, it is not: a non-unique key is chosen rather than the PRIMARY key.  That signals that the tool has done something wrong.
  • Are all columns selected?  Hopefully the tool will never make such a grievous mistake, but on never knows.
  • Are the columns used correctly given their data types and definitions?  If a column is defined as “NOT NULL” but the tool does “col IS NULL”, something is wrong.

If something seems suboptimal, illogical, or just plain wrong, please ask or file a bug.

The next block of output is another list of things that the tool will do if ran with –execute but does not currently do because this is a dry run:

Notice that the new, altered table created at the beginning is dropped.  This should be safe because no triggers are created and no data is copied, so the table is empty.

Although the tool does not actually do a lot of work during a dry run, it executes nearly all the same code paths that it will execute when ran with –execute instead.  This means that if there are any bugs in the tool, problems with the MySQL server (privileges, etc.), or other oddities, there is a good likelihood that a dry run will reveal them.

In summary: before running pt-online-schema-change with –execute, you should run it first with –dry-run and –print to see what the tool will do.  Read and think carefully about the output to verify that nothing seems odd or wrong.  And as always: backup your data and verify your backups before running pt-online-schema-change with –execute.

Comments

  1. christi parks says:

    Hello, i would like to ask that what is the benefits of sql training, what all topics should be covered and it is kinda bothering me … and has anyone studies from this course http://www.wiziq.com/course/125-comprehensive-introduction-to-sql of SQL tutorial online?? or tell me any other guidance…
    would really appreciate help… and Also i would like to thank for all the information you are providing on sql training.

Speak Your Mind

*