Quite common task during schema review is to find the optimal data type for the column value – for example column is defined as INT but is it really needed or may be SMALLINT or even TINYINT will do instead. Does it contain any NULLs or it can be defined NOT NULL which reduces space needed and speeds up processing in most cases ?

These and similar tasks are often done by bunch of SQL queries while really MySQL has a native feature to perform this task – PROCEDURE ANALYSE

Here is sample run of PROCEDURE ANALYSE on drupal schema of Percona web site. The output contains row for each column in the table but I’ve omitted everything by few rows.

Here is how this table was originally defined:

This example shows very well the output of PROCEDURE ANALYSE or any similar tool should not be taken for an instant action but rather used as a base to make your own decision.

For example we can see nid and vid which are defined as INT in original schema can fit to TINYINT. Should we change them ? Probably not – the site will continue to growth and we expect to have more than 255 items this type would allow.

We probably could have changed it to SMALLINT or MEDIUMINT but on the small database like this it does not really yields any significant benefits so it is better to stick to default.

The type field is suggested for conversion to ENUM though as we know new types can appear by inserts we can’t do it. In other drupal installation though when types never change and savings could be significant the change may make sense.

Another important piece of information for type column is maximum length which is really helpful in more cases than ENUM data type – maximum length of data actually stored often can tell us if we defined varchar field too wide. Comparing maximum field length to average length we can see if CHAR or VARCHAR type fits better – for example if column always have same width like hashes CHAR can be more efficient storage.

In general even though I would not just use this tool suggested data types to do an ALTER TABLE it provides very helpful information for performing schema audit.

9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Shlomi Noach

One thing that is really missing from the output is the *current* data type, and how far I am from exceeding it.
I’m developing a utility which will roughly act like PROCEDURE ANALYSE(), but will rather let you know you capacity percentage.

So far, it only supports AUTO_INCREMENT columns – and let’s you know how far you are from running out of space. You can ask it to only show you columns which are more than, say, 60% used. For example, a TINYINT column with a value of 200 will show in the output.

I’ll be adding other issues, such as used char-length etc. It does not come to replace PROCEDURE ANALYZE (it does not recommend data types) but rather give you an overall look on “what’s my status right now: am I running out of numbers? Am I running out of text length?”

Find it here: oak-show-limits. Again, it’s quite limited now, but work in ongoing.

Milos

Nice, real nice.
Both of you. 🙂

Shlomi Noach

Peter.

Indeed. PROCEDURE ANALYSE() is a helpful utility. Sorry if I appear to do too much “marketing” for my toolkit.

Robert Stewart

The Max_length field in the analyse() output is also very useful, as Shlomi indicates, for detecting if a varchar column is too narrow. I had created a column in a table to store names provided by a customer, and I thought I had made it sufficiently wide to handle the data they said they would be loading. While doing some maintenance I decided to run analyse() and discovered they had recently starting adding some entities with names that were only one character shorter than the maximum size. This allowed me to alter the table in off hours rather than during the middle of the day when they expanded the names further and INSERTs suddenly started failing.

Of course, this issue jumped out at me only because I knew all the column widths. A tool that combines both bits of info is indeed useful.

Shlomi Noach

@Robert

I think it’s a good opportunity to remind that sql_mode had best been set to ‘TRADITIONAL’ or otherwise strict mode, so that when attempting to INSERT texts *longer* than column width – an error is thrown (instead of silent truncation)

JuksWebGuy

This topic is useful, I learned from it.

Jonathan Valerio

PROCEDURE ANALYSE is good in addition to EXPLAIN but also consider the future growth of your database before doing any data type alter.