August 30, 2014

Generating test data for MySQL tables

One of the common tasks requested by our support customers is to optimize slow queries. We normally ask for the table structure(s), the problematic query and sample data to be able to reproduce the problem and resolve it by modifying the query, table structure, or global/session variables. Sometimes, we are given access to the server to test the queries on their live or test environment. But, more often than not, customers will not be able to provide us access to their servers or sample data due to security and data privacy reasons. Hence, we need to generate the test data ourselves.

A convenient way of generating test data is visiting http://generatedata.com which provides a web form where you can provide the columns and its corresponding data types, and turn them into test data. The website is capable of generating data in various formats such as Excel, HTML and JSON but for MySQL, you can choose either the CSV or SQL format.

The website limits you to generate up to a maximum of 100 rows. Fortunately, the web application is available for download so you can install it in your test server to generate data up to 100,000 rows instead.

Below are instructions on installing generatedata on CentOS 6:

1. Install Apache, PHP, MySQL and wget

2. Ensure Apache and MySQL runs on startup

3. Start Apache and MySQL

4. Create a MySQL user and database for the application

5. Download generatedata source and place it in the web directory

6. Make the application readable and writable by Apache

7. Open your browser to access the web application. In my case, it’s http://192.168.1.91/generatedata. It will prompt you to supply mysql credentials. Supply the MySQL credentials and click the “Continue” button to proceed.

generatedata_supply_mysql_settings

8. The next screen will prompt you to create a settings.php file which will be used to store the custom settings for the application which include access credentials to MySQL. Click the “Create File” button to proceed.
generatedata_create_settings_php

9. You can setup an anonymous account, single account with login or multiple accounts. For testing purposes, select the default option and click the “Continue” button.
generatedata_user_accounts

10. The next step is to install the plugins used for generating data and exporting them in a particular format. Click the “Install Plugins” button to continue.
generatedata_install_plugins

11. Once the plugins are generated, click the “Continue” button.
generatedata_plugins_installed

12. Finally, generatedata is installed. Click “Go to script” button to continue.
generatedata_installation_complete

Using generatedata
generatedata_interface

Below are samples of what generatedata can produce:

  • Names: “Karleigh K. Valencia”, “Claire W. Woodard”, “Yvonne Wyatt”
  • Date/Time: “2013-11-29 09:19:38″, “2013-11-29″, “11:05:53″
  • Phone/Fax: “(191) 919-9508″, “(847) 807-6360″, “(366) 902-0912″
  • Street Address: “P.O. Box 345, 8566 Mi St.”, “587-8731 Ultrices. Avenue”, “4612 Eu St.”
  • Latitude/Longitude: “37.5863, -20.25932″, “-48.29183, -69.31125″, “-78.67594″
  • Text: “Fusce aliquet magna a neque.”, “Nullam ut nisi a odio”, “mollis nec, cursus a, enim.”
  • Currency: “$3364.88″, “$7849.22″, “1217.18″
  • Alphanumeric: “RU384″, “GL941″, “HI144″
  • AutoIncrement: 1, 2, 3, 4, 5, 6, 7
  • Number Range: 1, 3, 8, 2, 14, 7
  • Custom List: “Dr.”, “Mr.”, “Mrs.”, “Ms.”

For our example, let’s use the employees table structure from the Employees sample database.

1. Create the employees table under the test database

2. Enter the columns and corresponding datatypes in the web form. In this example, using the appropriate data type for each column is straightforward:

  • emp_no: AutoIncrement, Start at 10000, Increment 1
  • birth_date: Date, From 01/01/1960, 12/31/1990, Format Y-m-d
  • first_name: Names, Alex(any gender)
  • last_name: Names, Smith(surname)
  • gender: Custom List, Exactly 1, Values M|F
  • hire_date: Date, From 01/01/2000, To 12/11/2014, Format code Y-m-d

generatedata_employee_input

3. Export the data in MySQL Insert format. Under the EXPORTS TYPES section, select SQL tab and enter the following information:

  • Database table: employees
  • Database Type: MySQL
  • Remove the tick on: Include CREATE TABLE query and Include DROP TABLE query

generatedata_export_type_mysql

4. Click the “Generate” button to generate the desired data. You can now copy the generated output and paste this in the MySQL console. On the other hand, you can also choose to download the generated rows as a file.
generatedata_insert_statements

CSV output
You can also opt to generate CSV output and use LOAD DATA INFILE to import the CSV data to your table. Under the EXPORT TYPES section, select the CSV tab. Inside the Generate frame, select “Prompt to download” and click the “Generate” button.

generatedata_csv

Upload the CSV file under the /tmp directory of your MySQL server and import it via the MySQL console:

Generating more than 100,000 rows at a time
Suppose you want to generate more than 100,000 rows, you can modify settings.php under /var/www/html/generatedata/settings.php and append $maxGeneratedRows = <maxGeneratedRows>; to the configuration file. The example below increases the maximum number of generated rows to 150,000.

Final Notes
There is just one major caveat to generating test data. Nothing compares to the actual data for reproducing query performance problems because its data and index cardinality will be entirely different from the generated test data. So, even if you’re able to fix the problem in your test environment, the solution may or may not apply to your production environment.

Comments

  1. pcrews says:

    That is a very cool tool. Thank you for sharing that. This is perhaps more primitive, but might also be useful (it is a part of the random query generator tool):
    https://github.com/RQG/RQG-Documentation/wiki/RandomDataGenerator

  2. Leonardo says:

    Hi!
    I’ve been using this site, and I have a problem when I choose “Custom List” as a Data Type and then I choose de option “At least 4″ and then I put four words separated by this symbol ” | ” wich I add also at the end of this list so it is posible to have an empty row. The problem is that this system writes this “, ” instead of don’t write nothing when correspond to an empty element, so in some rows I got “word 1, , word 4″ or even ” , , , ”
    Is there any solution for this?

    Thanks a lot. Greetings.

Speak Your Mind

*