MongoDB Data Modeling - Sample Chapter
MongoDB Data Modeling - Sample Chapter
MongoDB Data Modeling - Sample Chapter
ee
Sa
m
pl
Preface
Even today, it is still quite common to say that computer science is a young and new
field. However, this statement becomes somewhat contradictory when we observe
other fields. Unlike other fields, computer science is a discipline that is continually
evolving above the normal speed. I dare say that computer science has now set
the path of evolution for other fields such as medicine and engineering. In this
context, database systems as an area of the computer science discipline has not only
contributed to the growth of other fields, but has also taken advantage itself of the
evolution and progress of many areas of technology such as computer networks and
computer storage.
Formally, database systems have been an active research topic since the 1960s.
Since then, we have gone through a few generations, and big names in the IT
industry have emerged and started to dictate the market's tendencies.
In the 2000s, driven by the world's Internet access growth, which created a new
network traffic profile with the social web boom, the term NoSQL became common.
Considered by many to be a paradoxical and polemic subject, it is seen by some as
a new technology generation that has been developed in response to all changes we
have experienced in the last decade.
MongoDB is one of these technologies. Born in the early 2000s, it became the most
popular NoSQL database in the world. Not only the most popular database in the
world, since February 2015, MongoDB became the fourth most popular database
system according to the DB-Engines ranking (http://db-engines.com/en/),
surpassing the well-known PostgreSQL database.
Nevertheless, popularity should not be confused with adoption. Although the
DB-Engines ranking shows us that MongoDB is responsible for some traffic on search
engines such as Google, has job search activity, and has substantial social media
activity, we can not state how many applications are using MongoDB as a data source.
Indeed, this is not exclusive to MongoDB, but is true of every NoSQL technology.
Preface
The good news is that adopting MongoDB has not been a very tough decision to
make. It's open source, so you can download it free of charge from MongoDB Inc.
(https://www.mongodb.com), where you can find extensive documentation. You
also can count on a big and growing community, who, like you, are always looking
for new stuff on books, blogs, and forums; sharing knowledge and discoveries; and
collaborating to add to the MongoDB evolution.
MongoDB Data Modeling was written with the aim of being another research and
reference source for you. In it, we will cover the techniques and patterns used to
create scalable data models with MongoDB. We will go through basic database
modeling concepts, and provide a general overview focused on modeling in
MongoDB. Lastly, you will see a practical step-by-step example of modeling
a real-life problem.
Primarily, database administrators with some MongoDB background will take
advantage of MongoDB Data Modeling. However, everyone from developers to
all the curious people that downloaded MongoDB will make good use of it.
This book focuses on the 3.0 version of MongoDB. MongoDB 3.0, which was long
awaited by the community, is considered by MongoDB Inc. as its most significant
release to date. This is because, in this release, we were introduced to the new
and highly flexible storage architecture, WiredTiger. Performance and scalability
enhancements intend to strengthen MongoDB's emphasis among database systems
technologies, and position it as the standard database for modern applications.
Preface
Chapter 6, Managing the Data, focuses on the maintenance of data. This will teach
you how important it is to look at your data operations and administration before
beginning the modeling of data.
Chapter 7, Scaling, shows you how powerful the autosharing characteristic of
MongoDB can be, and how we think our data model is distributed.
Chapter 8, Logging and Real-time Analytics with MongoDB, takes you through an
schema design of a real-life problem example.
Querying Documents
In a NoSQL database, such as MongoDB, planning queries is a very important task,
and depending on the query you want to perform, your document can vary greatly.
As you saw in Chapter 2, Data Modeling with MongoDB, the decision to refer or
include documents in a collection is, in a large part, the result of our planning.
It is essential to determine whether we will give a preference to reading or
writing in a collection.
Here, we will see how planning queries can help us create documents in a more
efficient and effective way, and we will also consider more sensible questions
such as atomicity and transactions.
This chapter will focus on the following subjects:
Read operations
Write operations
Write concerns
Querying Documents
We can use the find interface to execute a query in MongoDB. The find
interface will select the documents in a collection and return a cursor for
the selected documents.
[ 38 ]
Chapter 3
Compared with the SQL language, the find interface should be seen as a select
statement. And, similar to a select statement where we can determinate clauses
with expressions and predicates, the find interface allows us to use criteria and
projections as parameters.
As mentioned before, we will use JSON documents in these find interface
parameters. We can use the find interface in the following way:
db.collection.find(
{criteria},
{projection}
)
In this example:
criteria is a JSON document that will specify the criteria for the selection of
Both are optional parameters, and we will go into more detail regarding these later.
Let's execute the following example:
db.customers.find(
{"username": "johnclay"},
{_id: 1, username: 1, details: 1}
)
In this example:
Querying Documents
It will return:
{
"_id" : ObjectId("54837b61f059b08503e200db"),
"name" : "Product 1",
"description" : "Product 1 description",
"price" : 10,
"supplier" : {
"name" : "Supplier 1",
"telephone" : "+552199998888"
}
}
{
"_id" : ObjectId("54837b65f059b08503e200dc"),
"name" : "Product 2",
"description" : "Product 2 description",
"price" : 20,
"supplier" : {
"name" : "Supplier 2",
"telephone" : "+552188887777"
}
}
[ 40 ]
Chapter 3
Let's check the following example query. We want to select the documents where the
attribute name is Product 1:
db.products.find({name: "Product 1"});
The preceding query selects the documents through the equality {name: "Product
The following example demonstrates how it's possible to select all documents where
the price is greater than 10:
db.products.find({price: {$gt: 10}});
[ 41 ]
Querying Documents
When we execute a query using the operator $gt, only documents that have an
information price greater than 10 will be returned as a result in the cursor.
In addition, there are other operators such as comparison, logical, element,
evaluation, geographical, and arrays.
Let's take, for example, the documents from the products collection, shown
as follows:
{
"_id" : ObjectId("54837b61f059b08503e200db"),
"name" : "Product 1",
"description" : "Product 1 description",
"price" : 10,
"supplier" : {
"name" : "Supplier 1",
[ 42 ]
Chapter 3
"telephone" : "+552199998888"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 6
}
]
}
{
"_id" : ObjectId("54837b65f059b08503e200dc"),
"name" : "Product 2",
"description" : "Product 2 description",
"price" : 20,
"supplier" : {
"name" : "Supplier 2",
"telephone" : "+552188887777"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 10
},
{
"customer" : {
"email" : "[email protected]"
[ 43 ]
Querying Documents
},
"stars" : 2
}
]
}
{
"_id" : ObjectId("54837b69f059b08503e200dd"),
"name" : "Product 3",
"description" : "Product 3 description",
"price" : 30,
"supplier" : {
"name" : "Supplier 3",
"telephone" : "+552177776666"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 9
}
]
}
[ 44 ]
Chapter 3
Comparison operators
MongoDB provides us with a way to define equality between values. With comparison
operators, we can compare BSON type values. Let's look at these operators:
The $gte operator is responsible for searching values that are equal or
greater than the value specified in the query. If we execute the query
db.products.find({price: {$gte: 20}}), it will return:
{
"_id" : ObjectId("54837b65f059b08503e200dc"),
"name" : "Product 2",
"description" : "Product 2 description",
"price" : 20,
"supplier" : {
"name" : "Supplier 2",
"telephone" : "+552188887777"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 10
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 2
}
]
}
{
"_id" : ObjectId("54837b69f059b08503e200dd"),
"name" : "Product 3",
[ 45 ]
Querying Documents
"description" : "Product 3 description",
"price" : 30,
"supplier" : {
"name" : "Supplier 3",
"telephone" : "+552177776666"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 9
}
]
}
With the $lt operator, it's possible to search for values that are inferior to the
requested value in the query. The query db.products.find({price: {$lt:
20}}) will return:
{
"_id" : ObjectId("54837b61f059b08503e200db"),
"name" : "Product 1",
"description" : "Product 1 description",
"price" : 10,
"supplier" : {
"name" : "Supplier 1",
"telephone" : "+552199998888"
},
"review" : [
{
"customer" : {
[ 46 ]
Chapter 3
"email" : "[email protected]"
},
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 6
}
]
}
The $lte operator searches for values that are less than or equal to the
requested value in the query. If we execute the query db.products.
find({price: {$lte: 20}}), it will return:
{
"_id" : ObjectId("54837b61f059b08503e200db"),
"name" : "Product 1",
"description" : "Product 1 description",
"price" : 10,
"supplier" : {
"name" : "Supplier 1",
"telephone" : "+552199998888"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 6
[ 47 ]
Querying Documents
}
]
}
{
"_id" : ObjectId("54837b65f059b08503e200dc"),
"name" : "Product 2",
"description" : "Product 2 description",
"price" : 20,
"supplier" : {
"name" : "Supplier 2",
"telephone" : "+552188887777"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 10
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 2
}
]
}
The $in operator is able to search any document where the value of a field
equals a value that is specified in the requested array in the query. The
execution of the query db.products.find({price:{$in: [5, 10, 15]}})
will return:
{
"_id" : ObjectId("54837b61f059b08503e200db"),
"name" : "Product 1",
"description" : "Product 1 description",
"price" : 10,
[ 48 ]
Chapter 3
"supplier" : {
"name" : "Supplier 1",
"telephone" : "+552199998888"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 6
}
]
}
The $nin operator will match values that are not included in the specified
array. The execution of the db.products.find({price:{$nin: [10,
20]}}) query will produce:
{
"_id" : ObjectId("54837b69f059b08503e200dd"),
"name" : "Product 3",
"description" : "Product 3 description",
"price" : 30,
"supplier" : {
"name" : "Supplier 3",
"telephone" : "+552177776666"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
[ 49 ]
Querying Documents
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 9
}
]
}
The $ne operator will match any values that are not equal to the specified
value in the query. The execution of the db.products.find({name: {$ne:
"Product 1"}}) query will produce:
{
"_id" : ObjectId("54837b65f059b08503e200dc"),
"name" : "Product 2",
"description" : "Product 2 description",
"price" : 20,
"supplier" : {
"name" : "Supplier 2",
"telephone" : "+552188887777"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 10
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 2
}
[ 50 ]
Chapter 3
]
}
{
"_id" : ObjectId("54837b69f059b08503e200dd"),
"name" : "Product 3",
"description" : "Product 3 description",
"price" : 30,
"supplier" : {
"name" : "Supplier 3",
"telephone" : "+552177776666"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 9
}
]
}
Logical operators
Logical operators are how we define the logic between values in MongoDB. These
are derived from Boolean algebra, and the truth value of a Boolean value can be
either true or false. Let's look at the logical operators in MongoDB:
Querying Documents
"description" : "Product 2 description",
"price" : 20,
"supplier" : {
"name" : "Supplier 2",
"telephone" : "+552188887777"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 10
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 2
}
]
}
Chapter 3
"customer" : {
"email" : "[email protected]"
},
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 9
}
]
}
The $not operator inverts the query effect and returns the values that do not
match the specified operator expression. It is used to negate any operation.
The execution of the db.products.find({price: {$not: {$gt: 10}}})
query will produce:
{
"_id" : ObjectId("54837b61f059b08503e200db"),
"name" : "Product 1",
"description" : "Product 1 description",
"price" : 10,
"supplier" : {
"name" : "Supplier 1",
"telephone" : "+552199998888"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 5
},
{
"customer" : {
"email" : "[email protected]"
[ 53 ]
Querying Documents
},
"stars" : 6
}
]
}
[ 54 ]
Chapter 3
Element operators
To query a collection about our documents fields, we can use element operators.
The $exists operator will return all documents that have the specified field in the
query. The execution of db.products.find({sku: {$exists: true}}) will not
return any document, because none of them have the field sku.
Evaluation operators
Evaluation operators are how we perform an assessment of an expression in
MongoDB. We must take care with this kind of operator, especially if there is no
index for the field we are using on the criteria. Let's consider the evaluation operator:
The $regex operator will return all values that match a regular expression.
The execution of db.products.find({name: {$regex: /2/}}) will return:
{
"_id" : ObjectId("54837b65f059b08503e200dc"),
"name" : "Product 2",
"description" : "Product 2 description",
"price" : 20,
"supplier" : {
"name" : "Supplier 2",
"telephone" : "+552188887777"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 10
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 2
}
]
}
[ 55 ]
Querying Documents
Array operators
When we are working with arrays on a query, we should use array operators.
Let's consider the array operator:
The $elemMatch operator will return all documents where the specified
array field values have at least one element that match the query criteria
conditions.
The db.products.find({review: {$elemMatch: {stars: {$gt: 5},
customer: {email: "[email protected]"}}}}) query will look
at all the collection documents where the review field has documents, the
stars field value is greater than 5, and customer email is customer@
customer.com:
{
"_id" : ObjectId("54837b65f059b08503e200dc"),
"name" : "Product 2",
"description" : "Product 2 description",
"price" : 20,
"supplier" : {
"name" : "Supplier 2",
"telephone" : "+552188887777"
},
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 10
},
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 2
}
]
}
[ 56 ]
Chapter 3
Projections
Until now, we have executed queries where the presented result is the document as
it is persisted in MongoDB. But, in order to optimize the network overhead between
MongoDB and its clients, we should use projections.
As you saw at the beginning of the chapter, the find interface allows us to use two
parameters. The second parameter is projections.
By using the same sample collection we used in the previous session, an example of a
query with projection would be:
db.products.find({price: {$not: {$gt: 10}}}, {name: 1, description: 1})
The projection is a JSON document with all the fields we would like to present or
hide, followed by 0 or 1, depending on what we want.
When a field is followed by a 0, then this field will not be shown in the resulting
document. On the other hand, if the field is followed by a 1, then this means that it
will be shown in the resulting document.
By default, the _id field has the value 1.
[ 57 ]
Querying Documents
In fields that have an array as a value, we can use operators such as $elemMatch,
$split, $slice, and $.
The db.products.find({price: {$gt: 20}}, {review: {$elemMatch: {stars:
5}}}) query will produce:
{
"_id" : ObjectId("54837b69f059b08503e200dd"),
"review" : [
{
"customer" : {
"email" : "[email protected]"
},
"stars" : 5
}
]
}
[ 58 ]
Chapter 3
Another important thing to understand about write operations is how the documents
are actually written on disk by MongoDB. MongoDB uses a journaling mechanism
to write operations, and this mechanism uses a journal to write the change operation
before we write it in the data files. This is very useful, especially when we have a
dirty shutdown. MongoDB will use the journal files to recover the database state to a
consistent state when the mongod process is restarted.
As stated in Chapter 2, Data Modeling with MongoDB, the BSON specification allows us
to have a document with the maximum size of 16 MB. Since its 2.6 version, MongoDB
uses a space allocation strategy for a record, or document, named "power of two
sized allocation." As its name suggests, MongoDB will allocate to each document a
size in bytes that is its size to the power of two (for example, 32, 64, 128, 256, 512, ),
considering that the minimum size of a document is 32 bytes. This strategy allocates
more space than the document really needs, giving it more space to grow.
Inserts
The insert interface is one of the possible ways of creating a new document in
MongoDB. The insert interface has the following syntax:
db.collection.insert(
<document or array of documents>,
{
writeConcern: <document>,
ordered: <boolean>
}
)
Here:
ordered should be a Boolean value, which if true will carry out an ordered
[ 59 ]
Querying Documents
In the following example, we can see how an insert operation can be used:
db.customers.insert({
username: "customer1",
email: "[email protected]",
password: hex_md5("customer1paswd")
})
As we did not specify a value for the _id field, it will be automatically generated
with a unique ObjectId value. The document created by this insert operation is:
{
"_id" : ObjectId("5487ada1db4ff374fd6ae6f5"),
"username" : "customer1",
"email" : "[email protected]",
"password" : "b1c5098d0c6074db325b0b9dddb068e1"
}
As you observed in the first paragraph of this section, the insert interface is not
the only way to create new documents in MongoDB. By using the upsert option on
updates, we could also create new documents. Let's go into more detail regarding
this now.
Updates
The update interface is used to modify previous existing documents in MongoDB,
or even to create new ones. To select which document we would like to change,
we will use a criterion. An update can modify the field values of a document or an
entire document.
An update operation will modify only one document at a time. If the criterion
matches more than one document, then it is necessary to pass a document with a
multi parameter with the true value to the update interface. If the criteria matches
no document and the upsert parameter is true, a new document will be created, or
else it will update the matching document.
The update interface is represented as:
db.collection.update(
<query>,
<update>,
{
[ 60 ]
Chapter 3
upsert: <boolean>,
multi: <boolean>,
writeConcern: <document>
}
)
Here:
upsert is a Boolean value that, if true, creates a new document if the criteria
multi is a Boolean value that, if true, updates every document that meets
the criteria
Using the document created in the previous session, a sample update would be:
db.customers.update(
{username: "customer1"},
{$set: {email: "[email protected]"}}
)
The $set operator allows us to update only the email field of the matched documents.
Otherwise, you may have this update:
db.customers.update(
{username: "customer1"},
{email: "[email protected]"}
)
[ 61 ]
Querying Documents
That is, without the $set operator, we modify the old document with the one
passed as a parameter on the update. Besides the $set operator, we also have
other important update operators:
This update will rename the field username to login in the matched
documents.
This update will remove the login field from the matched documents.
[ 62 ]
Chapter 3
As the write operations are atomic at the level of a single document, we can afford to
be careless with the use of the preceding operators. All of them can be safely used.
Write concerns
Many of the discussions surrounding non-relational databases are related to the
ACID concept. We, as database professionals, software engineers, architects, and
developers, are fairly accustomed to the relational universe, and we spend a lot of
time developing without caring about ACID matters.
Nevertheless, we should understand by now why we really have to take this
matter into consideration, and how these simple four letters are essential in the
non-relational world. In this section, we will discuss the letter D, which means
durability, in MongoDB.
Durability in database systems is a property that tells us whether a write operation
was successful, whether the transaction was committed, and whether the data was
written on non-volatile memory in a durable medium, such as a hard disk.
Unlike relational database systems, the response to a write operation in NoSQL
databases is determined by the client. Once again, we have the possibility to make
a choice on our data modeling, addressing the specific needs of a client.
In MongoDB, the response of a successful write operation can have many levels of
guarantee. This is what we call a write concern. The levels vary from weak to strong,
and the client determines the strength of guarantee. It is possible for us to have, in
the same collection, both a client that needs a strong write concern and another that
needs a weak one.
The write concern levels that MongoDB offers us are:
Unacknowledged
Acknowledged
Journaled
Replica acknowledged
[ 63 ]
Querying Documents
Unacknowledged
As its name suggests, with an unacknowledged write concern, the client will not
attempt to respond to a write operation. If this is possible, only network errors will
be captured. The following diagram shows that drivers will not wait that MongoDB
acknowledge the receipt of write operations:
Driver
Write
WriteConcern:
{ w: 0 }
mongod
Apply
Acknowledged
With this write concern, the client will have an acknowledgement of the write
operation, and see that it was written on the in-memory view of MongoDB. In this
mode, the client can catch, among other things, network errors and duplicate keys.
Since the 2.6 version of MongoDB, this is the default write concern.
[ 64 ]
Chapter 3
As you saw earlier, we can't guarantee that a write on the in-memory view of
MongoDB will be persisted on the disk. In the event of a failure of MongoDB, the
data in the in-memory view will be lost. The following diagram shows that drivers
wait MongoDB acknowledge the receipt of write operations and applied the change
to the in-memory view of data:
Driver
Response
Write
WriteConcern:
{ w: 1 }
mongod
Apply
Journaled
With a journaled write concern, the client will receive confirmation that the write
operation was committed in the journal. Thus, the client will have a guarantee that
the data will be persisted on the disk, even if something happens to MongoDB.
[ 65 ]
Querying Documents
To reduce the latency when we use a journaled write concern, MongoDB will reduce
the frequency in which it commits operations to the journal from the default value of
100 milliseconds to 30 milliseconds. The following diagram shows that drivers will
wait MongoDB acknowledge the receipt of write operations only after committing
the data to the journal:
Driver
Response
Write
WriteConcern:
{ w: 1, j: true }
mongod
Apply
Write to journal
Replica acknowledged
When we are working with replica sets, it is important to be sure that a write
operation was successful not only in the primary node, but also that it was
propagated to members of the replica set. For this purpose, we use a replica
acknowledged write concern.
By changing the default write concern to replica acknowledged, we can determine
the number of members of the replica set from which we want the write operation
confirmation. The following diagram shows that drivers will wait that MongoDB
acknowledge the receipt of write operations on a specified number of the replica
set members:
[ 66 ]
Chapter 3
Driver
Response
Write
WriteConcern:
{ w: 2 }
Primary
Replicate
Replicate
Apply
Secondary
Apply
Secondary
In the following example, we will wait until the write operation propagates to the
primary and at least two secondary nodes:
db.customers.insert(
{username: "customer1", email: "[email protected]", password: hex_
md5("customer1paswd")},
{writeConcern: {w: 3}}
)
[ 67 ]
Querying Documents
In the following example, we make an unordered bulk insert using the new
bulk methods:
var bulk = db.customers.initializeUnorderedBulkOp();
bulk.insert({username: "customer1", email: "[email protected]",
password: hex_md5("customer1paswd")});
bulk.insert({username: "customer2", email: "[email protected]",
password: hex_md5("customer2paswd")});
bulk.insert({username: "customer3", email: "[email protected]",
password: hex_md5("customer3paswd")});
bulk.execute({w: "majority", wtimeout: 3000});
[ 68 ]
Chapter 3
We should use all the power tools MongoDB provides us with, but not without
paying all our possible attention. MongoDB has a limit of executing a maximum of
1,000 bulk operations at a time. So, if this limit is exceeded, MongoDB will divide the
operations into groups of a maximum of 1,000 bulk operations.
Summary
In this chapter, you were hopefully able to better understand the read and write
operations in MongoDB. Moreover, now, you should also understand why it is
important that you already know the queries you need to execute even before
the document modeling process. Finally, you learned how to use the MongoDB
properties, such as atomicity, at the document level and saw how it can help us to
produce better queries.
In the next chapter, you will see how a special data structure known as index can
improve the execution of our queries.
[ 69 ]
www.PacktPub.com
Stay Connected: