I occasionally get in to light arguments healthy discussions with students about whether or not to use Foreign Key constraints on InnoDB tables.  My standard response has always been: “it depends on how much of a tradeoff you are willing to make for performance. In some situations the cost can be considerable”.

.. that’s when they expect me to “come up with some real proof” to show them. I do not disagree with their logic or proof being on their list-of-demands.  I support the use of data to make decisions.

The problem is that MySQL has (traditionally) been lacking the instrumentation required to make these decisions easy.  This is getting better  – here is an example we recently added to our InnoDB course:

In the last statement, session2 will block waiting on a lock. Want to know where that lock is? Check information_schema.innodb_locks:

The same example without the foreign key constraints does not block on the last statement.  We also see the expected output change to:

This information_schema table is new to InnoDB plugin.  In earlier releases of MySQL you may be able to get the data, but it is not in such an easily digestible form.  Instrumentation is the most under talked about feature in all new releases of MySQL and Percona Server.  See BUG #53336 for more examples of pure awesomeness.

10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jory Geerts

Since update 1 happens in a different (uncommitted) transaction then updates 2 and 3, they shouldn’t be affecting eachother, should they? (Also, are the transactions needed for this to work?)

And anyway, now next time those students ask for proof, you’ve got it. 🙂

Denis TRUFFAUT

InnoDB’s mantra consists in adding locks and key controls -so it’s undoubtable slower- in order to ensure that the database will not be corrupted by a statement. My opinion is, that kind of controls should be mandatory and the question of performance over security should not be asked.

If you want to go fast, pass over controls and denormalize your data, NoSQL & RAM caches are the solutions.
But please, let the database to be a database ^_^

Valdemar

Morgan, please correct me if i’m wrong:

we have 2 transactions, they start one after another. Then updating of child row occurs in 1-st transaction. Meanwhile in a second transaction two statements are executed. After all this, transactions are commited.

So, the first question: why “UPDATE parent SET bogus_column = ‘new!’ WHERE id = 4;” is not blocked? As I see, the child row with parent_id = 4 is modified! And isn’t it strange that “UPDATE parent SET bogus_column = ‘new!’ WHERE id = 5;” is blocked — why it should be? There no operations over the child row with parent_id = 5!

What am i missing?

Valdemar

Пожалуйста, ну хоть кто-нибудь!

Gray

You will notice that a child is changed to point to parent_id = 5:
(UPDATE child SET parent_id = 5 WHERE parent_id = 4;)
So now, one child is pointing to parent where parent_id = 5 but is not committed yet… It will be locked until committed…

Question is, why is this lock needed?!?!

a) I agree that only if the parent is deleted it should be a problem…
b) If the parent is found deleted at time of trying to commit the transaction, it should fail with “original item/parent was deleted by other transaction” message…

It seems to me that MySQL is not as robust as I thought…

But, I am not as educated as I would like to, and come from many years of using PostgreSQL that seems to be awesome (yes, I do like journaling databases, they keep my data safe 🙂

Any comments of comparison?

Will this only create temporary lock or will it create a deadlock if session 1 is never committed? (I do assume there is a max wait time where it is canceled, but even waiting 30 secs is a lifetime in webapps)

Glenn

This is an older discussion but still important since Google searches will still direct folks here.

Denis says let the database be a database. What is a database? A means of storing data that allows you to get to the data quickly and logically. A database engine could have all sorts of cool bells and whistles but if those bells and whistles cause your application to drag down and result in crappy response times, they are not appropriate for that application.

Usually I choose the bells and whistles based on the size and complexity of the database. And where in my application those capabilities becomes necessary. If I have one or two spots where a foreign key restraint becomes necessary in a large and complex application it seems a little code that accomplishes the same thing is better than having that rule cause performance problems throughout the application.

By all means use the most sophisticated tools available within a data base engine that improves response and efficiency. But just because it has a really cool capability doesn’t mean that capability SHOULD or MUST be used.

I have used everything from DB2 to Oracle and find mySQL every bit as robust and efficient as any commercial product for what I have developed so far. But the biggest difference between the two environments is that with DB2 and Oracle I always had a group of DBA types taking my database design and doing the implementation. Far better than my experience could manage. I do my own mySQL implementation. It is my believe that a good mySQL DBA could get just as much out of mySQL as any good DB2 DBA could get out of DB2.

Maybe I’m wrong. But first I have to develop and application where the size and complexity of the data would make the difference.

Dave

So, locking and subsequent blocking is one thing, but as you said, the data referenced via your FK may not be changed all that often depending on how you application is written. For example, someone creates an account, that account gets assigned an ID, then a transaction is created which references that account via an FK; it’s unlikely that the “reference data” of the account information will be updated frequently. The blocking issue can be mitigated by keeping it in mind while coding the application.

I have heard that with Oracle, FKs provide a performance benefit because the optimizer has more information about what needs to be done and the plan is created faster. Does innodb get any such performance benefit in the same manner?

One thing not mentioned here is: I think we’ve all run into schemas where the only way to understand the entity relationships was to read through the application. FKs can provide a very intuitive view into data relationships, not to mention help prevent the application from creating inconsistent data. FKs can help make the schema the final gate for data consistency.

So, how about some real performance data benchmarks with and without FKs to see if the additional locking (blocking aside) has any real concern?

SmeegHeed

I’ve been caught out exactly on this making some assumptions of what it “should” do internally. InnoDB is not as specific as you might assume when it comes to parent row locking. It does what it says on the tin, row locking. Even if you have a transaction that only works on children the parent rows will still be locked for writes in general.

Theoretically in this case InnoDB should lock the parent table or row (depending on isolation required) against insertion, deletion and update of the PK. In reality the behaviour I am observing is that the parent row is locked outright against any update. It does allow inserts to the table however which I suspect can cause small anomalies.

It could in theory be faster and more specific but it isn’t.