Schema Design in MongoDB vs Schema Design in MySQL

For people used to relational databases and doing MySQL database design, using NoSQL solutions such as MongoDB brings interesting challenges. One of them is schema design: while in the relational world, normalization is a good way to start, how should we design our collections when creating a new MongoDB application?

Let’s see with a simple example how we would create a data structure for MySQL (or any relational database) and for MongoDB. We will assume in this post that we want to store people information (their name) and the details from their passport (country and validity date).

Relational Design

In the relational world, the basic idea is to try to stick to the 3rd normal form and create two tables (I’ll omit indexes and foreign keys for clarity – MongoDB supports indexes but not foreign keys):

mysql> select * from people;
+----+------------+
| id | name       |
+----+------------+
|  1 | Stephane   |
|  2 | John       |
|  3 | Michael    |
|  4 | Cinderella |
+----+------------+

mysql> select * from passports;
+----+-----------+---------+-------------+
| id | people_id | country | valid_until |
+----+-----------+---------+-------------+
|  4 |         1 | FR      | 2020-01-01  |
|  5 |         2 | US      | 2020-01-01  |
|  6 |         3 | RU      | 2020-01-01  |
+----+-----------+---------+-------------+

mysql> select * from people;

+----+------------+

| id | name |

+----+------------+

| 1 | Stephane |

| 2 | John |

| 3 | Michael |

| 4 | Cinderella |

+----+------------+

mysql> select * from passports;

+----+-----------+---------+-------------+

+----+-----------+---------+-------------+

| 4 | 1 | FR | 2020-01-01 |

| 5 | 2 | US | 2020-01-01 |

| 6 | 3 | RU | 2020-01-01 |

+----+-----------+---------+-------------+

One of the good things with such a design is that it’s equally easy to run any query (as long as we don’t consider joins as something difficult to use):

Do you want the number of people?
SELECT count(*) FROM people
1
SELECT count(*) FROM people
Do you want to know the validity date of Stephane’s passport?
SELECT valid_until from passports ps join people pl ON ps.people_id = pl.id WHERE name = 'Stephane'
1
SELECT valid_until from passports ps join people pl ON ps.people_id = pl.id WHERE name = 'Stephane'
Do you want to know how many people do not have a passport? Run
SELECT name FROM people pl LEFT JOIN passports ps ON ps.people_id = pl.id WHERE ps.id IS NULL
1
SELECT name FROM people pl LEFT JOIN passports ps ON ps.people_id = pl.id WHERE ps.id IS NULL
etc

MongoDB design

Now how should we design our collections in MongoDB to make querying easy?

Using the 3rd normal form is of course possible, but that would probably be inefficient as all joins should be done in the application. So out of the 3 queries above, only the query #1 could be easily run. So which other designs could we have?

A first option would be to store everything in the same collection:

> db.people_all.find().pretty()
{
	"_id" : ObjectId("51f7be1cd6189a56c399d3bf"),
	"name" : "Stephane",
	"country" : "FR",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{
	"_id" : ObjectId("51f7be3fd6189a56c399d3c0"),
	"name" : "John",
	"country" : "US",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{
	"_id" : ObjectId("51f7be4dd6189a56c399d3c1"),
	"name" : "Michael",
	"country" : "RU",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{ "_id" : ObjectId("51f7be5cd6189a56c399d3c2"), "name" : "Cinderella" }

> db.people_all.find().pretty()

{

"_id" : ObjectId("51f7be1cd6189a56c399d3bf"),

"name" : "Stephane",

"country" : "FR",

"valid_until" : ISODate("2019-12-31T23:00:00Z")

}

{

"_id" : ObjectId("51f7be3fd6189a56c399d3c0"),

"name" : "John",

"country" : "US",

"valid_until" : ISODate("2019-12-31T23:00:00Z")

}

{

"_id" : ObjectId("51f7be4dd6189a56c399d3c1"),

"name" : "Michael",

"country" : "RU",

"valid_until" : ISODate("2019-12-31T23:00:00Z")

}

{ "_id" : ObjectId("51f7be5cd6189a56c399d3c2"), "name" : "Cinderella" }

By the way, we can see here that MongoDB is schemaless: there is no problem in storing documents that do not have the same structure.

The drawback is that it is no longer clear which attributes belong to the passport, so if you want to get all passport information for Michael, you will need to correctly understand the whole data structure.

A second option would be to embed passport information inside people information – MongoDB supports rich documents:

> db.people_embed.find().pretty()
{
	"_id" : ObjectId("51f7c0048ded44d5ebb83774"),
	"name" : "Stephane",
	"passport" : {
		"country" : "FR",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{
	"_id" : ObjectId("51f7c70e8ded44d5ebb83775"),
	"name" : "John",
	"passport" : {
		"country" : "US",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{
	"_id" : ObjectId("51f7c71b8ded44d5ebb83776"),
	"name" : "Michael",
	"passport" : {
		"country" : "RU",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{ "_id" : ObjectId("51f7c7258ded44d5ebb83777"), "name" : "Cinderella" }

> db.people_embed.find().pretty()

{

"_id" : ObjectId("51f7c0048ded44d5ebb83774"),

"name" : "Stephane",

"passport" : {

"country" : "FR",

"valid_until" : ISODate("2019-12-31T23:00:00Z")

}

{

"_id" : ObjectId("51f7c70e8ded44d5ebb83775"),

"name" : "John",

"passport" : {

"country" : "US",

"valid_until" : ISODate("2019-12-31T23:00:00Z")

}

{

"_id" : ObjectId("51f7c71b8ded44d5ebb83776"),

"name" : "Michael",

"passport" : {

"country" : "RU",

"valid_until" : ISODate("2019-12-31T23:00:00Z")

}

{ "_id" : ObjectId("51f7c7258ded44d5ebb83777"), "name" : "Cinderella" }

Or we could embed the other way (however this looks a bit dubious as some people may not have a passport like Cinderella in our example):

> db.passports_embed.find().pretty()
{
	"_id" : ObjectId("51f7c7e58ded44d5ebb8377b"),
	"country" : "FR",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "Stephane"
	}
}
{
	"_id" : ObjectId("51f7c7ec8ded44d5ebb8377c"),
	"country" : "US",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "John"
	}
}
{
	"_id" : ObjectId("51f7c7fa8ded44d5ebb8377d"),
	"country" : "RU",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "Michael"
	}
}
{
	"_id" : ObjectId("51f7c8058ded44d5ebb8377e"),
	"person" : {
		"name" : "Cinderella"
	}
}

> db.passports_embed.find().pretty()

{

"_id" : ObjectId("51f7c7e58ded44d5ebb8377b"),

"country" : "FR",

"valid_until" : ISODate("2019-12-31T23:00:00Z"),

"person" : {

"name" : "Stephane"

}

{

"_id" : ObjectId("51f7c7ec8ded44d5ebb8377c"),

"country" : "US",

"valid_until" : ISODate("2019-12-31T23:00:00Z"),

"person" : {

"name" : "John"

}

{

"_id" : ObjectId("51f7c7fa8ded44d5ebb8377d"),

"country" : "RU",

"valid_until" : ISODate("2019-12-31T23:00:00Z"),

"person" : {

"name" : "Michael"

}

{

"_id" : ObjectId("51f7c8058ded44d5ebb8377e"),

"person" : {

"name" : "Cinderella"

}

That’s a lot of options! How can we choose? Here is where you should be aware of a fundamental difference between MongoDB and relational databases when it comes to schema design:

Collections inside MongoDB should be designed with the most frequent access patterns of the application in mind, while in the relational world, you can forget how data will be accessed if your tables are normalized.

So…

If you read people information 99% of the time, having 2 separate collections can be a good solution: it avoids keeping in memory data is almost never used (passport information) and when you need to have all information for a given person, it may be acceptable to do the join in the application.
Same thing if you want to display the name of people on one screen and the passport information on another screen.
But if you want to display all information for a given person, storing everything in the same collection (with embedding or with a flat structure) is likely to be the best solution.

Conclusion

We saw in this post one of the fundamental differences between MySQL database design and MongoDB database design when it comes to creating the right data structure for an application: with MongoDB, you need to know the data access pattern of the application. This should not be neglected as creating a wrong schema design is a recipe for disaster: queries will be difficult to write and to optimize, they will be slow and they will sometimes need to be replaced by custom code. All that can lead to low performance and frustration.

The next question is: which way is better? And of course, there is no definite answer: MongoDB fans will say that by making all access patterns equal, normalization make them equally bad, and normalization fans will say that a normalized schema provides good performance for most applications and that you can always denormalize to help a few queries run faster.

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Robert Hodges

10 years ago

Your last two paragraphs minimize the design effect of the structural differences between a relational DBMS like MySQL and MongoDB. In MySQL with InnoDB data are clustered by the primary key, whereas in MongoDB they are clustered around full BSON documents. This is a more complex form of clustering and if you get it wrong at any point you can have significant performance problems that require app rewrites because in Mongo your object model determines your storage access patterns far more strongly than is the case in MySQL.

Also, MongoDB currently limits the scope of transactions to a single document. Making updates across multiple documents atomic and isolated is an application issue. This is *far* harder to get right than joins and can lead to effectively unsolvable bugs in applications. (I imagine this will eventually be corrected but probably not quickly as it’s a hard problem within MongoDB too.)

The real design issue with a DBMS like MongoDB is that you have to correctly predict both query patterns as well as the transactional update model relatively early on. This tends to be possible only for cases where a single application with a relatively simple object model is accessing the DBMS. So one effect of using MongoDB is that you need to partition application services to meet these conditions or face the possibility of expensive rewrites that ripple far into the application logic. In other words, MongoDB forces you to structure application service architecture around data. That is not necessarily a bad thing but it’s a significant design assumption and deserves to be highlighted when making the DBMS choice.

Stephane Combaudon

Author

10 years ago

@Robert: you’re right, unlike relational DBMS, MongoDB forces you to predict your access patterns first, and this is by design. That can be hard for new applications, but if you simply want to migrate part of an existing application to MongoDB, that may be much easier. Anyway this is something you should always have in mind.

As for the transactions that are limited to a single document, it is something you should be aware of from the beginning as well. But it’s not really different than MyISAM (it’s even a bit better) and we still see people running successfully with MyISAM.

oracle 11g course

10 years ago

Such a nice blog you share….
Thank you to know about this….
Regards:
Qadir Shaikh.
Visit at http://www.oratc.com

Ahmad Umar

10 years ago

Please I need a complete weekly report about MongoDB vs MySQL query standards

Anna

9 years ago

For more indepth information on comparison between mysql and nodejs in terms of querying, schema design, data representation, relationships, transaction and performance, read on Comparison between MySQL and MongoDB.

Juan Carlos

9 years ago

I just working in a project and i need use a very large corpus

with your comparative i have more notions about the schema design for some DBM

Thanks for your comparative

click this link here now

9 years ago

Hmm it looks like your site ate my first comment (it was extremely long) so I guess I’ll just sum it up what I submitted and
say, I’m thoroughly enjoying your blog. I as well am an aspiring blog
writer but I’m still new to everything. Do you have any points for novice blog writers?
I’d genuinely appreciate it.

amarbhanu

9 years ago

Nice,

i am new for mongoDB so face some problem for join with 1 + table as per requirement need to 2 + table join so suggest me how to solve problem.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Schema Design in MongoDB vs Schema Design in MySQL

Relational Design

MongoDB design

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Schema Design in MongoDB vs Schema Design in MySQL

Relational Design

MongoDB design

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

Valkey/Redis: Configuration Best Practices

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation