Eventual Consistency in MySQL

We’re told that foreign key constraints are the best way to enforce logical referential integrity (RI) in SQL, preventing rows from becoming orphaned. But then we learn that the enforcement of foreign keys incurs a significant performance overhead.^1,2

MySQL allows us to set FOREIGN_KEY_CHECKS=0 to disable enforcement of RI when the overhead is too high. But if you later restore enforcement, MySQL does not immediately scan all your data to verify that you haven’t broken any references while the enforcement was disabled. That’s undoubtedly the best choice, but it leaves our database in a strange state where RI constraints are active, but we cannot assume that all our data satisfies RI. At some point we do want to perform some quality control, to verify consistency. How can we do that?

Quality Control Queries

We need to check for orphaned rows in every parent-child relationship in your database. That is, do an exclusion join from child table to its referenced parent table, and if no matching parent row is found, then the child is an orphan. For example, we have a parent table Foo and a child table Bar:

CREATE TABLE Foo (A INT, B INT, PRIMARY KEY (A, B));
CREATE TABLE Bar (ID INT PRIMARY KEY, X INT, Y INT,
FOREIGN KEY (X,Y) REFERENCES Foo(A,B));

CREATE TABLE Foo (A INT, B INT, PRIMARY KEY (A, B));

CREATE TABLE Bar (ID INT PRIMARY KEY, X INT, Y INT,

FOREIGN KEY (X,Y) REFERENCES Foo(A,B));

You can find orphaned rows in Bar:

SELECT Bar.ID FROM Bar LEFT OUTER JOIN Foo ON (Bar.X,Bar.Y) = (Foo.A,Foo.B) WHERE Foo.A IS NULL;

1	SELECT Bar.ID FROM Bar LEFT OUTER JOIN Foo ON (Bar.X,Bar.Y) = (Foo.A,Foo.B) WHERE Foo.A IS NULL;

If the result set of this query is empty, then there are no orphaned rows.

But there are probably hundreds of foreign key relationships in any complex database. We can find all foreign keys by querying the INFORMATION_SCHEMA. The KEY_COLUMN_USAGE contains information about both primary keys and foreign keys. If the REFERENCED_* columns are non-null, it’s a foreign key.

mysql> SELECT * FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE REFERENCED_TABLE_NAME IS NOT NULL\G
*************************** 1. row ***************************
           CONSTRAINT_CATALOG: def
            CONSTRAINT_SCHEMA: test
              CONSTRAINT_NAME: Bar_ibfk_1
                TABLE_CATALOG: def
                 TABLE_SCHEMA: test
                   TABLE_NAME: Bar
                  COLUMN_NAME: X
             ORDINAL_POSITION: 1
POSITION_IN_UNIQUE_CONSTRAINT: 1
      REFERENCED_TABLE_SCHEMA: test
        REFERENCED_TABLE_NAME: Foo
       REFERENCED_COLUMN_NAME: A
*************************** 2. row ***************************
           CONSTRAINT_CATALOG: def
            CONSTRAINT_SCHEMA: test
              CONSTRAINT_NAME: Bar_ibfk_1
                TABLE_CATALOG: def
                 TABLE_SCHEMA: test
                   TABLE_NAME: Bar
                  COLUMN_NAME: Y
             ORDINAL_POSITION: 2
POSITION_IN_UNIQUE_CONSTRAINT: 2
      REFERENCED_TABLE_SCHEMA: test
        REFERENCED_TABLE_NAME: Foo
       REFERENCED_COLUMN_NAME: B
2 rows in set (0.00 sec)

mysql> SELECT * FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE

WHERE REFERENCED_TABLE_NAME IS NOT NULL\G

*************************** 1. row ***************************

CONSTRAINT_CATALOG: def

CONSTRAINT_SCHEMA: test

CONSTRAINT_NAME: Bar_ibfk_1

TABLE_CATALOG: def

TABLE_SCHEMA: test

TABLE_NAME: Bar

COLUMN_NAME: X

ORDINAL_POSITION: 1

POSITION_IN_UNIQUE_CONSTRAINT: 1

REFERENCED_TABLE_SCHEMA: test

REFERENCED_TABLE_NAME: Foo

REFERENCED_COLUMN_NAME: A

*************************** 2. row ***************************

CONSTRAINT_CATALOG: def

CONSTRAINT_SCHEMA: test

CONSTRAINT_NAME: Bar_ibfk_1

TABLE_CATALOG: def

TABLE_SCHEMA: test

TABLE_NAME: Bar

COLUMN_NAME: Y

ORDINAL_POSITION: 2

POSITION_IN_UNIQUE_CONSTRAINT: 2

REFERENCED_TABLE_SCHEMA: test

REFERENCED_TABLE_NAME: Foo

REFERENCED_COLUMN_NAME: B

2 rows in set (0.00 sec)

This shows that test.Bar has columns X,Y which reference test.Foo columns A,B.

Generating SQL with SQL

Now that we know the information in KEY_COLUMN_USAGE, we can write a query to fetch every distinct KEY_COLUMN_USAGE.CONSTRAINT_NAME where the REFERENCED_* columns are non-null. From that information, we can generate an exclusion-join query for each foreign key relationship:

mysql> SELECT CONCAT(
 'SELECT ', GROUP_CONCAT(DISTINCT CONCAT(K.CONSTRAINT_NAME, '.', P.COLUMN_NAME,
  ' AS `', P.TABLE_SCHEMA, '.', P.TABLE_NAME, '.', P.COLUMN_NAME, '`') ORDER BY P.ORDINAL_POSITION), ' ',
 'FROM ', K.TABLE_SCHEMA, '.', K.TABLE_NAME, ' AS ', K.CONSTRAINT_NAME, ' ',
 'LEFT OUTER JOIN ', K.REFERENCED_TABLE_SCHEMA, '.', K.REFERENCED_TABLE_NAME, ' AS ', K.REFERENCED_TABLE_NAME, ' ',
 ' ON (', GROUP_CONCAT(CONCAT(K.CONSTRAINT_NAME, '.', K.COLUMN_NAME) ORDER BY K.ORDINAL_POSITION),
 ') = (', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') ',
 'WHERE ', K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME, ' IS NULL;'
 ) AS _SQL
 FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE K
 INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE P
 ON (K.TABLE_SCHEMA, K.TABLE_NAME) = (P.TABLE_SCHEMA, P.TABLE_NAME)
 AND P.CONSTRAINT_NAME = 'PRIMARY'
 WHERE K.REFERENCED_TABLE_NAME IS NOT NULL
 GROUP BY K.CONSTRAINT_NAME;

 +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | _SQL                                                                                                                                                                   |
 +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | SELECT Bar_ibfk_1.ID AS `test.Bar.ID` FROM test.Bar AS Bar_ibfk_1 LEFT OUTER JOIN test.Foo AS Foo  ON (Bar_ibfk_1.X,Bar_ibfk_1.Y) = (Foo.A,Foo.B) WHERE Foo.B IS NULL; |
 +------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 1 row in set (0.00 sec)

mysql> SELECT CONCAT(

'SELECT ', GROUP_CONCAT(DISTINCT CONCAT(K.CONSTRAINT_NAME, '.', P.COLUMN_NAME,

' AS `', P.TABLE_SCHEMA, '.', P.TABLE_NAME, '.', P.COLUMN_NAME, '`') ORDER BY P.ORDINAL_POSITION), ' ',

'FROM ', K.TABLE_SCHEMA, '.', K.TABLE_NAME, ' AS ', K.CONSTRAINT_NAME, ' ',

'LEFT OUTER JOIN ', K.REFERENCED_TABLE_SCHEMA, '.', K.REFERENCED_TABLE_NAME, ' AS ', K.REFERENCED_TABLE_NAME, ' ',

' ON (', GROUP_CONCAT(CONCAT(K.CONSTRAINT_NAME, '.', K.COLUMN_NAME) ORDER BY K.ORDINAL_POSITION),

') = (', GROUP_CONCAT(CONCAT(K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME) ORDER BY K.ORDINAL_POSITION), ') ',

'WHERE ', K.REFERENCED_TABLE_NAME, '.', K.REFERENCED_COLUMN_NAME, ' IS NULL;'

) AS _SQL

FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE K

INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE P

ON (K.TABLE_SCHEMA, K.TABLE_NAME) = (P.TABLE_SCHEMA, P.TABLE_NAME)

AND P.CONSTRAINT_NAME = 'PRIMARY'

WHERE K.REFERENCED_TABLE_NAME IS NOT NULL

GROUP BY K.CONSTRAINT_NAME;

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| _SQL |

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| SELECT Bar_ibfk_1.ID AS `test.Bar.ID` FROM test.Bar AS Bar_ibfk_1 LEFT OUTER JOIN test.Foo AS Foo ON (Bar_ibfk_1.X,Bar_ibfk_1.Y) = (Foo.A,Foo.B) WHERE Foo.B IS NULL; |

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.00 sec)

It can be a dizzying exercise to write SQL queries that in turn output valid SQL queries, but with a little care and good testing, we can do it.

Using The Quality Control Query

We now have a query that can verify the referential integrity between Foo and Bar. If we run this query and it returns a non-empty result set, it shows us which rows in Bar are orphans.

mysql> INSERT INTO Foo (A,B) VALUES (111,2222), (333,444);
mysql> INSERT INTO Bar (ID,X,Y) VALUES (21,333,444);

mysql> SET FOREIGN_KEY_CHECKS=0;
mysql> INSERT INTO Bar (ID,X,Y) VALUES (42,555,666); -- THIS IS AN ORPHAN

mysql> SELECT Bar_ibfk_1.ID AS `test.Bar.ID` FROM test.Bar AS Bar_ibfk_1 LEFT OUTER JOIN test.Foo AS Foo  ON (Bar_ibfk_1.X,Bar_ibfk_1.Y) = (Foo.A,Foo.B) WHERE Foo.B IS NULL;
+-------------+
| test.Bar.ID |
+-------------+
|          42 |
+-------------+
1 row in set (0.00 sec)

mysql> INSERT INTO Foo (A,B) VALUES (111,2222), (333,444);

mysql> INSERT INTO Bar (ID,X,Y) VALUES (21,333,444);

mysql> SET FOREIGN_KEY_CHECKS=0;

mysql> INSERT INTO Bar (ID,X,Y) VALUES (42,555,666); -- THIS IS AN ORPHAN

mysql> SELECT Bar_ibfk_1.ID AS `test.Bar.ID` FROM test.Bar AS Bar_ibfk_1 LEFT OUTER JOIN test.Foo AS Foo ON (Bar_ibfk_1.X,Bar_ibfk_1.Y) = (Foo.A,Foo.B) WHERE Foo.B IS NULL;

+-------------+

| test.Bar.ID |

+-------------+

| 42 |

+-------------+

1 row in set (0.00 sec)

That shows that the quality control query can find orphans. Because we have a way to generate quality control queries automatically, we can run them at any time:

$ cat generate_foreign_key_checks.sql | mysql -N | mysql -E
*************************** 1. row ***************************
test.Bar.ID: 42

$ cat generate_foreign_key_checks.sql | mysql -N | mysql -E

*************************** 1. row ***************************

test.Bar.ID: 42

DIY RI

We can get information about foreign key constraints from the INFORMATION_SCHEMA, but what if our database includes relationships for which we never declared foreign key constraints at all? There’s no foreign key constraint, therefore the INFORMATION_SCHEMA gives us no information about the relationship. That’s okay, we can create a conventional table with similar structure, and track our database design manually, so we can use the same SQL queries for quality control:

mysql> CREATE TABLE test.MY_KEY_COLUMN_USAGE LIKE INFORMATION_SCHEMA.KEY_COLUMN_USAGE;
mysql> INSERT INTO test.MY_KEY_COLUMN_USAGE SET
            CONSTRAINT_SCHEMA = 'ecommerce',
              CONSTRAINT_NAME = 'fk_lineitems',
                 TABLE_SCHEMA = 'ecommerce',
                   TABLE_NAME = 'LineItems',
                  COLUMN_NAME = 'order_id',
             ORDINAL_POSITION = 1,
POSITION_IN_UNIQUE_CONSTRAINT = 1,
      REFERENCED_TABLE_SCHEMA = 'ecommerce',
        REFERENCED_TABLE_NAME = 'Orders',
       REFERENCED_COLUMN_NAME = 'ID';

mysql> CREATE TABLE test.MY_KEY_COLUMN_USAGE LIKE INFORMATION_SCHEMA.KEY_COLUMN_USAGE;

mysql> INSERT INTO test.MY_KEY_COLUMN_USAGE SET

CONSTRAINT_SCHEMA = 'ecommerce',

CONSTRAINT_NAME = 'fk_lineitems',

TABLE_SCHEMA = 'ecommerce',

TABLE_NAME = 'LineItems',

COLUMN_NAME = 'order_id',

ORDINAL_POSITION = 1,

POSITION_IN_UNIQUE_CONSTRAINT = 1,

REFERENCED_TABLE_SCHEMA = 'ecommerce',

REFERENCED_TABLE_NAME = 'Orders',

REFERENCED_COLUMN_NAME = 'ID';

All that remains is to employ test.MY_KEY_COLUMN_USAGE in our query generator script instead of INFORMATION_SCHEMA.KEY_COLUMN_USAGE.

Conclusion

Without real foreign key constraints, the database can’t prevent rows from being orphaned, and it’s almost a given that this will occur from time to time. You can use techniques like the above to automate quality control and detect orphans early so you can correct them.

¹Innodb locking and Foreign Keys

²Instrumentation and the cost of Foreign Keys

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Bradley Holt

12 years ago

Great post! The theoretical connection between this technique and eventual consistency is clear to me, but what’s the practical connection? For example, are you suggesting that one could use this technique as one piece of a master-master cluster setup puzzle (e.g. a node accepts a write for a child row of a parent that it doesn’t have)? Or, are you talking mainly about the performance benefit of turning off RI and the loss of consistency within a single node that goes with that?

Bill Karwin

Author

12 years ago

Hi Bradley,

Right, we’d like to avoid (or at least defer) the performance cost of checking RI for every INSERT/UPDATE/DELETE statement — without neglecting integrity completely.

If strict RI enforcement is important for a given application, then we’d enable the foreign key constraints and bear the overhead. That would be the cost of doing business for that app.

But as you know from your work with CouchDB, some application can tolerate some anomalies in data integrity. But we don’t want our data to turn completely to mush either; we’d like to detect and correct any anomalies. So with the QC approach I describe above, we can schedule bulk RI checking for a later time, e.g. when the database load is lighter.

Bradley Holt

12 years ago

Thanks for the clarification!

Roland Bouman

12 years ago

Bill, nice post.

To whom it may concern: I wrote a stored procedure that tests for foreign key constraint violations. You can find it here:

http://forge.mysql.com/tools/tool.php?id=11

12 years ago

Would be nice to have tool in percona toolkit for such checks 🙂

Brian

12 years ago

A pretty fast way to make sure no relations are broken is doing an insert/update/delete is to structure the update to do nothing if the relationship is broken. or do all your writes in stored routines/triggers.

with the former all you need to do when doing the insert is
insert into table (col1, col2)
select
val1,
val2
from relationship check.

If no rows are added, the relationship broke. The only issue is that it relies on developers to write code that keeps these relationships in mind

Bill Karwin

Author

12 years ago

Brian: Exactly, there are always ways to avoid breaking RI in theory if the developers write code perfectly. 🙂

The problem is that even if you write *your* code perfectly, the next developer who writes code to write to the same tables may not. Or else someone may insert/update/delete rows ad hoc, not using your application but instead using a mysql client directly. Over time, without the database enforcing RI constraints, it’s practically guaranteed that orphans will occur.

Bill Karwin

Author

12 years ago

Roland: Nice stored procedure! I’ve also thought of using the column COMMENTS with some kind of annotation syntax to record RI relationships for which there are no foreign keys defined. That’s a project for another day.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Eventual Consistency in MySQL

Quality Control Queries

Generating SQL with SQL

Using The Quality Control Query

DIY RI

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Eventual Consistency in MySQL

Quality Control Queries

Generating SQL with SQL

Using The Quality Control Query

DIY RI

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

High Availability: Choosing the Right Option for Your Percona Monitoring and Management

New Valkey Packages by Percona

Can We Set up a Replicate Filter Within the Percona XtraDB Cluster?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation