21 Mongo DB
21 Mongo DB
{_id:
name: “Thomson”,
Age: 22,
Address: {{street: “124 church street”,
city: “brooklyn”,
state: “NY”,
zip: “13400”,
country: “US”}}
}
MongoDB Data Types
MongoDB supports a wide range of datatypes, such as:
• String − Must be UTF-8 valid
• Integer − Stores a numerical value of 32 bit or 64 bit depending upon the server
• Boolean − Stores true/ false value
• Double − Stores floating point values
• Min/Max keys − Compares a value against the lowest and highest BSON elements
• Arrays − Stores arrays, lists, or multiple values into one key
• Date − Stores the current date or time in UNIX format
• Timestamp − Useful for keeping a record of the modifications or additions to a
document
• Object − Used for embedded documents
• Object ID − Stores the ID of a document
• Binary data − For storing binary data
• Null − Stores a null value
• Symbol − Used identically to a string but mainly for languages that have specific symbol
types
• Code − For storing JavaScript code into the document
Advantages of MongoDB
1. Distributed Data Platform
• Changing business requirements will no longer affect successful project delivery in
your enterprise.
• A flexible data model with dynamic schema, and powerful GUI and command-line
tools, makes it fast for developers to build and evolve applications.
• Automated provisioning enables continuous integration and delivery for productive
operations.
• Static relational schemas and complex operations of RDBMS are now something
from the past.
2. Fast and Iterative Development
• MongoDB stores data in flexible JSON-like documents, which makes data persistence
and combining easy.
• The objects in your application code are mapped to the document model, due to
which working with data becomes easy.
• Needless to say, schema governance controls, data access, complex aggregations,
and rich indexing functionality are not compromised in any way.
• Without downtime, one can modify the schema dynamically.
• Due to this flexibility, a developer needs to worry less about data manipulation.
4. Reduced TCO (Total Cost of Ownership)
• Application developers can do their job way better when MongoDB is used.
• The operations team also can perform their job well, thanks to the Atlas Cloud
service.
• Costs are significantly lowered as MongoDB runs on commodity hardware.
• The technology gives out on-demand, pay-as-you-go pricing with annual
subscriptions, along with 24/7 global support.
5. Integrated Feature Set
• One can get a variety of real-time applications because of analytics and data
visualization, event-driven streaming data pipelines, text, and geospatial search,
graph processing, in-memory performance.
• For RDBMS to accomplish this, they require additional complex technologies, along
with separate integration requirements.
Drawbacks of MongoDB
The _id field serves as the primary key for the document, ensuring its uniqueness within the collection.
MongoDB automatically assigns a unique _id value to each document if one is not provided explicitly.
Using the _id Field: MongoDB automatically assigns a unique value to the _id field if not provided explicitly.
• Output:
• {
"_id": ObjectId("6156d8f013c92ec9f5b8b36f"),
"name": “Sameera"
}
13
Using Custom _id Field
We can specify our own custom _id value for a document. This value
should be unique within the collection to avoid conflicts. Let’s see an
example:
// Inserting a document with a custom _id field
db.users.insertOne({ "_id": 1001, "name": "Bob" });
14
primary key - foreign key
• MongoDB, being a NoSQL database, does not support traditional
foreign key constraints like those found in relational databases such
as MySQL or PostgreSQL.
• Instead, MongoDB uses a different approach for managing
relationships between documents.
• The alternative to foreign keys in MongoDB is to use a technique
called "Embedded Documents"
• Embedded Documents:
• In this approach, you can embed related data directly within a document.
• For example, if you have a "users" collection and a "comments" collection where
each comment belongs to a user, you can embed the user information within each
comment document.
conclusion:
By default MongoDB does not support primary key - foreign key relationships.
Every document in MongoDB contains an _id key field that uniquely identifies the
document. Primary key - foreign key relationships can be implemented by embedding
{
"_id": ObjectId("5ff7fcf93e2c5a4462b1e619"),
"text": "This is a comment.",
"user": {
"_id": ObjectId("5ff7fbc93e2c5a4462b1e612"),
"username": "user123",
"email": "[email protected]"
}
}
• This approach simplifies querying and ensures that all necessary data is retrieved with a single
query. However, it can lead to data duplication if the embedded data is shared across multiple
documents.
16
Retrieving embedded document in a MongoDB collection
To specify a query condition on fields in an embedded/nested document, use dot notation
("field.nestedField").
When querying using dot notation, the field and nested field must be inside quotation
marks.
The following example selects all documents where the field uom nested in the size field equals "in":
const cursor = db.collection('inventory').find({
'size.uom': 'in'
});
The following query uses the less than operator ($lt) on the field h embedded in the size field
const cursor = db.collection('inventory').find({
'size.h': { $lt: 15 }
});
17
Indexing in MongoDB:
In MongoDB, indexes are a critical component of the database system that improve
query performance by allowing for faster data retrieval.
An index is a data structure that stores a subset of the data from one or more fields in a
collection, along with a reference to the location of the corresponding documents.
Here are the key aspects of indexes in MongoDB:
• Query performance:
Indexes are used to speed up query performance.
Without indexes, MongoDB would need to perform a full collection scan, which can be
slow and resource-intensive, especially for large datasets.
• Fields:
We can create indexes on one or more fields in a MongoDB collection.
These fields are referred to as the "indexed fields." Indexes can be single-field indexes or
compound indexes that include multiple fields.
Compound indexes are useful for queries that involve multiple criteria.
• Automatic Indexing:
MongoDB automatically creates an index on the _id field for every document. This
ensures that each document has a unique identifier and allows for efficient retrieval by
Index Types:
MongoDB supports various types of indexes, including:
• Single-field Indexes: These indexes are created on a single field, such as an index on
a field that is frequently queried for equality or range queries.
• Compound Indexes: These indexes are created on multiple fields. Compound
indexes can speed up queries that involve multiple criteria.
• Text Indexes: Text indexes are used for full-text search operations on string content.
• Geospatial Indexes: These indexes are used for queries involving geospatial data,
such as location-based searches.
• Hashed Indexes: Hashed indexes are used for hash-based sharding and can improve
write performance in some scenarios.
Note:
• By default, MongoDB creates an index on the _id field for every collection. This
index is known as the "_id index." The _id field is a special field in MongoDB that
serves as the primary key for documents in a collection. It ensures that each
document has a unique identifier within the collection.
• While the _id index is created by default, you can also specify custom indexes on
other fields in your collections to optimize query performance for your specific use
cases. These custom indexes are created based on your application's needs and can
significantly improve query performance for fields frequently used in queries or
Callback Hell (Callback Pyramid) and Promises
"Callback hell" and "promises" are related concepts in JavaScript, and they both deal with
managing asynchronous code. They are used to handle situations where you need to
perform tasks that take time to complete, such as making HTTP requests, reading files, or
interacting with databases, without blocking the main thread of execution.
asyncFunction1((result1) => {
asyncFunction2(result1, (result2) => {
asyncFunction3(result2, (result3) => {
// ...and so on
});
});
As you can see, as more asynchronous operations are chained together, the code becomes harder to
understand and manage.
Promises
• Promises are a way to handle asynchronous operations in a more structured and readable manner. A
promise represents a value that might be available now, in the future, or never. Promises have three
states: pending, fulfilled, or rejected. They provide a cleaner alternative to callbacks and help avoid
callback hell.
• Here's how you can rewrite the previous example using promises:
asyncFunction1()
.then((result1) => {
return asyncFunction2(result1);
})
.then((result2) => {
return asyncFunction3(result2);
})
.then((result3) => {
// ...and so on
})
.catch((error) => {
// Handle errors
With promises, you can chain asynchronous operations in a more linear and readable way. Each .then()
block handles the result of the previous operation, and you can use .catch() to handle any errors that
may occur.
Async/Await (ES6+)
ES6 introduced the async/await syntax, which provides a more concise and synchronous-looking way
to work with asynchronous code, built on top of promises. Here's how you can rewrite the previous
example using async/await:
2. Model Creation:
• Once we have defined a schema, we can create a model from it. A model is a
constructor function that allows you to create, read, update, and delete documents
in the corresponding MongoDB collection. For example:
33
const mongoose=require("mongoose");
const userSchema=new mongoose.Schema({
firstName:{
type:String,
required:true,
},
lastName:{
type:String
},
email:{
type:String,
required:true,
unique:true
},
jobTitle
gender });
const User=mongoose.model("user", userSchema);
34
mongoose.connect('mongodb://127.0.0.1:27017/
mynewdb')
.then(()=>console.log("MongoDB connected"))
.catch(err=>console.log("Error in Mongo",err));
app.post("/api/users",async(req,res)=>{
const body=req.body;
if(!body){
return res.status(400).json({msg:"All fields required"});
}
const result= await User.create({
firstName:body.first_name,
lastName:body.last_name,
email:body.email,
gender:body.gender,
jobTitle:body.job_title,
});
console.log("result",result);
return res.status(201).json({msg:"success"});
}); 36
• Now, in cmd prompt, run the command db.users.find({}),an entry
will be visible.
• Code for .get(“/users”) route
app.get('/users',async(req,res)=>{
const allDBUsers=await User.find({});
const html=`
<ul>
${allDBUsers.map((user)=>`<li>${user.firstName}
</li>`).join(" ")};
</ul>
`;
res.send(html);
});
37
Code for .get(“/api/users”) route
app.get('/api/users',async(req,res)=>{
const allDBUsers=await User.find({});
return res.json(allDBUsers);
});
38
Code for .route(“/api/users/:id”) route
app
.route("/api/users/:id")
.get(async(req,res)=>{
const user=await User.findById(req.params.id);
if(!user) return res.json("Error");
return res.json(user);
})
.patch(async(req,res)=>{
await User.findByIdAndUpdate(req.params.id,{lastName:"New"});
return res.json({status:"PATCH request success"});
})
.delete(async(req,res)=>{
await User.findByIdAndDelete(req.params.id);
return res.json({status:"Successful DELETE request"});
});
39
Mongo DB example Queries
• Imp: We can also create a collection during the insert process.
db.posts.insertOne(object)
• There are 2 methods to find and select data from a MongoDB collection, find() and findOne().
• find(): To select data from a collection in MongoDB, we can use the find() method.
This method accepts a query object. If left empty, all documents will be returned.
db.posts.find({})
• findOne(): To select only one document, we can use the findOne() method.
This method accepts a query object. If left empty, it will return the first document it finds.
db.posts.findOne()
40
Find()
• To query, or filter, data we can include a query in
our find() or findOne() methods.
• Find all documents that have a category of "news” in
collection posts
db.posts.find( {category: "News"} )
41
Projection
• Both find methods accept a second parameter called projection.
• This parameter is an object that describes which fields to include in the results.
• **This parameter is optional. If omitted, all fields will be included in the results.
• Below example will only display the title and date fields in the results.
• Example excluding the date category field. All other fields will be included in the results.
42
Update
{
category: "News",
likes: 1,
date: Date()
Update Document
• The first parameter is a query object to define which document or documents should be updated.
43
• updateOne(): The updateOne() method will update the first document that
is found matching the provided query.
Example: db.posts.updateOne( { title: "Post Title 1" }, { $set: { likes: 2 } } )
• Check the document again with below command and you'll see that the
"like" have been updated.
db.posts.find( { title: "Post Title 1" } )
• updateMany(): The updateMany() method will update all documents that
match the provided query.
• Ex:Update likes on all documents by 1. For this we will use
the $inc (increment) operator:
db.posts.updateMany({}, { $inc: { likes: 1 } })
44
Delete()
• deleteOne():The deleteOne() method will delete the first document that matches the query provided.
• deleteMany(): The deleteMany() method will delete all documents that match the query provided.
45
MongoDB Query Operators
• There are many query operators that can be used to compare and reference document fields.
• Comparison
47
MongoDB Update Operators
• There are many update operators that can be used during document updates.
48
Profiling in MongoDB:
49
Profiling in MongoDB:
• The profiler in MongoDB is a built-in tool that allows to
capture and analyze the performance characteristics of
database operations.
• It helps to identify slow or frequently executed queries,
which can be crucial for optimizing MongoDB deployment.
• The profiler is off by default.
• We can enable the profiler at one of several profiling levels.
Profiler Levels:
MongoDB offers different profiling levels, each with varying levels of detail
and impact on server performance:
• Off (Level 0):
Profiling is turned off.
No profiling information is collected.
This is the default setting.
• Slow Operations Only (Level 1):
Profiling captures information about slow operations only.
By default, operations taking longer than 100 milliseconds are considered slow,
but this threshold can be adjusted.
• All Operations (Level 2):
Profiling captures information about all database operations, regardless of their
execution time.
This level provides the most detailed information but can significantly impact 51
Enabling Profiling:
• We can enable database profiling for mongod instances.
• mongod is the primary daemon process for the MongoDB system.
• It handles data requests, manages data access, and performs
background management operations
• To enable profiling for a mongod instance, set the profiling level to a
value greater than 0.
• The profiler records data in the system.profile collection.
• MongoDB creates the system.profile collection in a database after we
enable profiling for that database.
52
Example
• To enable profiling and set the profiling level, pass the profiling level
to the db.setProfilingLevel() helper.
• For example: db.setProfilingLevel(2)
• By default, the slow operation threshold is 100 milliseconds. To
change the slow operation threshold, specify the required threshold
value.
• The following example sets the profiling level for the currently
connected database to 1 and sets the slow operation threshold for
the mongod instance to 20 milliseconds
• db.setProfilingLevel( 1, { slowms: 20 } ) 53
Version of MongoDB in use:
MongoDB has been focusing on 64-bit versions
the 32-bit version of MongoDB has been deprecated and is no longer recommended
for production use.
There are several reasons for this:
• Limited addressable memory,
• Performance,
• Document size limitations,
• Lack of features and support,
• Security and stability.
Virtuals
• Sometimes, it is required to have certain properties that we can
call on our documents but don't want to save those properties in
the database.
Example: get the full name of a user
• These properties are usually not required when the document is
created.
• But may occur as a result of some sort of processing carried out on
the document.
• Virtuals are document properties that can be get and set but that
do not persist to MongoDB.
• They only exist logically
• They are not written to the document’s collection
• Instead, they are computed on-the-fly when we access them.
• While not methods in the traditional sense, virtuals allow you to define computed
properties on your documents. 55
Suppose we have the following user model and we want
access to a user's full name (i.e firstname + lastname).
const userSchema = new mongoose.Schema({
firstname: {
type: String,
required: true
},
lastname: {
type: String
}
})
const User = mongoose.model('user', userSchema)
userSchema.virtual('fullname')
.get(function() {
return `${this.firstname} ${this.lastname}`
})
How to use:
const user = await User.create({
firstname: 'money',
lastname: 'man'
})
console.log(`Welcome ${user.fullname}`)
Above code creates a virtual field on the schema called fullname which returns a string
containing the firstname and lastname.
The fullname virtual is not stored in Mongodb, rather it is created during runtime and
57
is attached to model.
Models in Mongoose:
• A model in represents a structure or blueprint for data in MongoDB database.
• A model involves following in the context of MongoDB and Mongoose:
Schema Definition: A model defines the structure of the documents that will be stored
in a specific MongoDB collection. The schema defines the fields, their data types,
validation rules, and default values. In Mongoose, we create a schema using the
mongoose.Schema constructor.
Document Operations: With a model, we can perform various CRUD (Create, Read,
Update, Delete) operations on MongoDB documents. These operations include
creating new documents, querying for existing ones, updating documents, and
deleting documents.
Methods in Mongoose:
In Mongoose, we can define and use various methods to interact with MongoDB
models and perform CRUD operations, as well as custom business logic.
These methods can be divided into two main categories: instance methods and static
methods.
1. Instance Methods:
• Instance methods are functions defined on individual documents
(instances) of a Mongoose model.
• These methods operate on a single document and can access and
manipulate the document's properties.
• We define them within the schema definition, and they can be called on specific
document instances.
64
Replica sets in Mongodb:
In MongoDB, replica sets are a fundamental part of achieving high availability, fault
tolerance, and data redundancy.
Replica sets consist of multiple MongoDB instances (servers) that work together to
provide these benefits.
There are two types of members within a MongoDB replica set: primary and
secondary.
1. Primary Replica:
• One and Only One:
A replica set can have only one primary member at any given time.
The primary is the authoritative member that receives all write operations (inserts,
updates, deletes) from clients.
It is the only member that can accept write operations.
• Reads
Clients can also read data from the primary, but this can introduce load
on the primary.
It's generally advisable to read from secondary members to distribute
read traffic and reduce the load on the primary.
• Elections
• If the primary member becomes unavailable (due to failure or
maintenance), the replica set will automatically hold an election to
select a new primary.
• This ensures that data remains available and that write operations can
continue.
66
2. Secondary Replica:
• Multiple Secondaries
A replica set can have zero or more secondary members.
Secondaries replicate data from the primary, which means they have a copy of the
same data as the primary.
• Reads
Clients can read data from secondary members.
Reading from secondaries can distribute read traffic and improve read scalability.
However, secondary members may lag behind the primary due to replication latency.
• No Writes
Secondary members cannot accept direct write operations from clients.
They are read-only in this sense.
However, you can promote a secondary to become the new primary if needed.
• Failover
In the event of a primary member failure, one of the secondaries can be automatically
elected as the new primary.
This ensures that write operations can continue without manual intervention.
Conclusion:
• Primary: Typically used for write operations. It's the main entry point for inserting,
updating, and deleting data.
• Secondary: Used for read operations to distribute the read workload and improve
read performance. Can also be used as a failover option if the primary fails.
Some more Queries in mongodb
In MongoDB, queries are used to retrieve data from a database.
MongoDB uses a flexible and powerful querying system that allows user to find
documents in a collection based on various criteria.
Some common query operations in MongoDB:
1. Find Documents: The find() method is used to query documents in a collection.
We can specify a filter to match documents that meet certain criteria. For
example:
db.collectionName.find({ field: value });
• This query will return all documents in the collection where the field matches the
specified value.
2. Query Operators: MongoDB provides a wide range of query operators to perform
more complex queries.
For example, we can use $gt (greater than), $lt (less than), $eq (equal), $ne (not
equal), $in (matches any of the values in an array), and many others.
5. Limit and Skip: To limit the number of documents returned or skip a certain number
of documents, we can use the limit() and skip() methods, respectively.
db.collectionName.find().limit(10); // Limit to 10 documents
db.collectionName.find().skip(5); // Skip the first 5 documents
6. Projection: You can specify which fields to include or exclude from the query results
using projection.
db.collectionName.find({}, { fieldName1: 1, fieldName2: 0 });
In this example, "fieldName1" will be included, and "fieldName2" will be excluded
from the results.
7. Indexing: MongoDB uses indexes to improve query performance. We can create
custom indexes on specific fields to speed up queries that involve those fields.
db.collectionName.createIndex({ fieldName: 1 });
Covered query in Mongodb
• A covered query in MongoDB is a type of query
optimization where all the fields mentioned in a query are
included in an index.
• MongoDB can fulfill the query solely by scanning the index
without needing to access the actual documents in the
collection.
• This can significantly improve query performance since
reading from an index is usually faster than reading from
disk.
• Covered queries are beneficial when dealing with large
datasets, as they reduce the amount of data to be read
from disk and transferred over the network, leading to
faster query execution.
• Properly designed indexes and query projections are
essential for maximizing benefits of covered queries.
• For a query to be considered a covered query, the following
conditions must be met:
1. All fields in the query projection are part of the index.
This means that the fields specified in the find() method's projection
must be covered by the index.
If we include a field in the projection that is not part of the index, the
query won't be considered a covered query.
In this query:
• db.collectionName should be replaced with the actual name of your
collection.
• find() is the method used to retrieve documents from the collection.
• { Marks: { $gt: 90 } } is the query filter. It specifies that you want to
find documents where the “Marks" field is greater than 90 using the
$gt (greater than) operator.
• After executing this query, MongoDB will return all documents from
the collection that meet the specified criteria, which, in this case, are
documents where the “Marks" field is greater than 90.
Having more than one model per collection with Mongoose:
In Mongoose, we generally define a single data model (schema) per
collection. However, it is possible to have more than one model per
collection.
But this is generally discouraged and should only be considered in
specific situations where it provides a clear advantage.
Some scenarios for multiple models per collection:
• Schema Evolution:
If we need to change the structure of your documents within a
collection over time, creating a new model for the updated schema can
be a way to handle schema evolution.
Old documents will still be accessible using the old model, while new
documents adhere to the new schema.
• Access Control:
In some cases,we might want to restrict access to certain fields within a
collection based on user roles or permissions.
We can create separate models with different projections (field
selections) to control which fields are accessible to different users.
• Performance Optimization:
We might have very specific performance requirements where having
separate models allows to optimize queries differently for different use
cases.
• While these scenarios justify having multiple models per collection,
it's essential to be cautious because it can lead to complexity and
potential maintenance challenges.
• We should carefully weigh the advantages against the complexity it
introduces and consider alternative approaches like using a single
model with conditional validation, middleware, or schema evolution
strategies.
• In most cases, designing a clear and consistent schema for each
collection is a better practice, as it simplifies code maintenance and
makes it easier to work with data in a predictable way.
• Proper schema design and data modeling are key aspects of building
efficient and maintainable applications with MongoDB and
Mongoose.
78
Utilities for backup and restore in MongoDB:
MongoDB provides several utilities and methods for backup and restore
operations to help safeguard the data and recover from potential data loss
scenarios.
Some of the most commonly used MongoDB backup and restore utilities:
• mongodump and mongorestore:
mongodump:
This utility allows to create a binary export of the data from a MongoDB
database.
We can specify options to filter which databases or collections to back up.
It produces BSON (Binary JSON) files that can be easily restored using
mongorestore.
mongorestore:
Used to restore data that was previously backed up with mongodump.
It reads BSON files created by mongodump and inserts the data back into
MongoDB.
• File Copy:
We can perform a simple backup by copying the entire MongoDB
data directory to a backup location.
However, this method is not recommended for live databases, as it
may lead to inconsistencies if data is being written during the copy
process.
• Third-party Backup Solutions:
There are third-party backup solutions designed specifically for
MongoDB, such as MongoDB Atlas Backup and many more. These
tools offer additional features and flexibility for backup and restore
operations.
• Snapshot Backups:
MongoDB supports creating backups at the file system level by taking snapshots
of the data directory.
This method is highly efficient and is often used in production environments.
• Ops Manager (MongoDB Backup):
MongoDB offers a comprehensive backup solution through MongoDB Ops
Manager. It provides scheduled backups, point-in-time recovery, and monitoring
capabilities. This option is suitable for large-scale, enterprise-grade MongoDB
deployments.
• Cloud Backup Services:
Many cloud providers offer MongoDB backup services as part of their managed
MongoDB offerings. For example, AWS provides the DocumentDB service, which
includes automated backups and restoration features. Similarly, Azure Cosmos DB
offers backup and restore functionality.
When choosing a backup and restore strategy for your MongoDB deployment,
consider factors like the size of data, recovery time objectives (RTO), and recovery
point objectives (RPO). It's essential to regularly test backup and restore
procedures to ensure they work as expected in case of data loss or system
failures.
Aggregation Framework in MongoDB.
When we work with MongoDB, we generally use the find() command for
a wide range of queries. However, as soon as queries get more
advanced, we need to know more about MongoDB aggregation
Aggregation is a way of processing a large number of
documents in a collection by means of passing them
through different stages. The stages make up what is
known as a pipeline. The stages in a pipeline can filter,
sort, group, reshape and modify documents that pass
through the pipeline.
One of the most common use cases of Aggregation is to
calculate aggregate values for groups of documents. This
is similar to the basic aggregation available in SQL
with the GROUP BY clause and COUNT, SUM and
AVG functions.
MongoDB Aggregation goes further though and can also
perform relational-like joins, reshape documents, create
While there are other methods of obtaining aggregate data in MongoDB, the
aggregation framework is the recommended approach for most work.
The Aggregation Framework in MongoDB is a powerful and flexible tool for querying and
transforming data stored in MongoDB collections.
The Aggregation Framework is commonly used for tasks such as data analysis, reporting, and
generating custom views of data.
How does the MongoDB aggregation pipeline work?
The input of the pipeline can be a single collection, where others can be merged
later down the pipeline.
The pipeline then performs successive transformations on the data until our goal
is achieved.
This way, we can break down a complex query into easier stages, in each of
which we complete a different operation on the data. So, by the end of the
query pipeline, we will have achieved all that we wanted.
This approach allows us to check whether our query is functioning properly at
every stage by examining both its input and the output. The output of each
stage will be the input of the next.
There is no limit to the number of stages used in the query, or how we combine
them. 83
MongoDB aggregate pipeline syntax
• db.collectionName.aggregate(pipeline, options),
where collectionName – is the name of a collection,
pipeline – is an array that contains the aggregation stages,
• options – optional parameters for the aggregation
84
MongoDB $group aggregation operators
The $group stage supports certain expressions (operators) allowing users to perform arithmetic, array,
boolean and other operations as part of the aggregation pipeline.
Operator Meaning
$push Adds extra values into the array of the resulting document.
85
Examples
Let I have a collection named universities with fields: country, city,
name locations, etc.
• The $match stage allows us to choose just those documents from a
collection that we want to work with. It does this by filtering out
those that do not follow our requirements.
• In the following example, we only want to work with those
documents which specify that Spain is the value of the field country,
and Salamanca is the value of the field city
• db.universities.aggregate([
• { $match : { country : 'Spain', city : 'Salamanca' } }
• ]).pretty() 86
• $match: Filter documents to pass only those that match the specified
condition(s).
• db.users.aggregate([
{ $match: { age: { $gt: 30 } } }
])
• $group: It Groups documents by the city field and calculates the
average age using the $avg accumulator.
• db.users.aggregate([
{ $group: { _id: "$city", averageAge: { $avg: "$age" } } }
])
87
• It is good practice to return only those fields you need so as to avoid
processing more data than is necessary.
• The $project stage is used to do this and to add any calculated fields
that you need.
• In this example, we only need the fields country, city and name
• db.universities.aggregate([
{ $project : { _id : 0, country : 1, city : 1, name : 1 } }
]).pretty()
88
Components
Here are the key components and concepts of the MongoDB
Aggregation Framework:
1. Pipeline Stages:
• Aggregation operations in MongoDB are constructed as a series of
stages in a pipeline. Each stage performs a specific data processing
operation on the documents as they pass through the pipeline. The
output of one stage serves as the input for the next stage.
• Typical stages include $match, $group, $project, $sort, $limit,
$unwind, and more.
89
2. Operators:
MongoDB provides a wide range of operators that can be used within aggregation
stages to perform various operations on documents.
These operators include arithmetic, comparison, logical, and array operators,
among others.
Examples include $sum, $avg, $max, $min, $push, $addToSet, $match, and many
others.
3. Expression Evaluation:
Aggregation expressions allow you to compute values dynamically within
aggregation stages. Expressions can be used for calculations, conditional
operations, and data transformations.
For example, you can use expressions to create new fields, conditionally filter
documents, or manipulate array elements.
4. Grouping and Accumulation:
• The $group stage is used to group documents by specified criteria. You can group
documents by one or more fields and then perform aggregation operations (e.g.,
sum, average) on the grouped data.
• Accumulators like $sum, $avg, $max, and $min are commonly used within the
$group stage to perform calculations on grouped data.
• Projection and Transformation:
The $project stage allows you to reshape documents by specifying which fields to
include or exclude from the output. You can also create new fields with computed
values using expressions.
This stage is useful for generating customized views of your data.
• Sorting and Limiting:
The $sort and $limit stages enable you to order the output documents and limit
the number of documents returned in the result.
• Array Operations:
The Aggregation Framework includes stages like $unwind to work with arrays.
$unwind creates a new document for each element in an array field, effectively
"flattening" the array for further processing.
• Output Options:
Aggregation results can be output in various formats, including as a cursor, an
array, or written to a new collection for further analysis or storage.
MongoDB Memory Requirements:
• MongoDB's RAM requirements can vary depending on your specific
use case, the size of dataset, and performance expectations.
MongoDB doesn't inherently require a lot of RAM, but the amount of
RAM needed will depend on several factors:
Working Set Size, Read-Heavy Workloads, Write Operations, Index
Size, Concurrency, Aggregation and Sorting
• In summary, MongoDB's RAM requirements are not fixed and depend
on your specific use case and workload. While MongoDB can work
with varying amounts of RAM, having an adequate amount of RAM to
accommodate working set and other operational needs can
significantly enhance its performance. The specific amount of RAM
you need should be determined by assessing your application's
requirements and performance goals.
$unset operator
The $unset operator is used to remove a specific field from a
MongoDB document. It takes the field name as its argument and
removes that field completely from the document.
db.Employee.update
(
{ _id: ObjectId("65f8b8499abfc382ebcd9668") },
{ $unset: { department: "" } }
)
CRUD Operations
• CRUD operations describe the conventions of a user
interface that let users view, search, and modify parts of
the database.
• The create operation is used to insert new documents in
the MongoDB database.
• db.collection.insertOne(): insertOne() allows to insert
one document into the collection. If the create operation
is successful, a new document is created. The function
will return an object where “acknowledged” is “true” and
“insertID” is the newly created “ObjectId.”
• db.collection.insertMany(): It's possible to insert multiple
items at one time by calling the insertMany() method on
the desired collection. In this case, we pass multiple
items into our chosen collection and separate them by
commas. 94
• The read operation is used to query a document in the
database.
• db.collection.find():To get all the documents from a
collection, we can simply use the find() method on our
chosen collection. Executing just the find() method with no
arguments will return all records currently in the collection.
• db.collection.findOne(): To get one document that satisfies
the search criteria, we can simply use the findOne()
method on our chosen collection. If multiple documents
satisfy the query, this method returns the first document
according to the natural order which reflects the order of
documents on the disk. If no documents satisfy the search
criteria, the function returns null.
95
• The update operation is used to modify existing documents in the
database.
• db.collection.updateOne():We can update a currently existing
record and change a single document with an update operation. To
do this, we use the updateOne() method on a chosen collection,
• To update a document, we provide the method with two arguments:
an update filter and an update action.
• The update filter defines which items we want to update, and the
update action defines how to update those items. We first pass in
the update filter. Then, we use the “$set” key and provide the fields
we want to update as a value.
• db.RecordsDB.updateOne({name: "Marsh"}, {$set:{ownerAddress:
"451 W. Coffee St. A204"}})
• db.collection.updateMany() updateMany() allows us to update
multiple items by passing in a list of items, just as we did when
inserting multiple items.
• db.RecordsDB.updateMany({species:"Dog"}, {$set: {age: "5"}} 96
• The delete operation is used to remove documents from
the database.
• db.collection.deleteOne():deleteOne() removes a
document from a specified collection on the MongoDB
server. A filter criteria is used to specify the item to delete.
It deletes the first record that matches the provided filter.
• db.RecordsDB.deleteOne({name:"Maki"})
• db.collection.deleteMany():deleteMany() is a method used
to delete multiple documents from a desired collection
with a single delete operation. A list is passed into the
method and the individual items are defined with filter
criteria as in deleteOne().
• db.RecordsDB.deleteMany({species:"Dog"})
97
Concurrency in MongoDB
MongoDB allows multiple clients to read and write the same data.
To ensure consistency, MongoDB uses locking and
concurrency control to prevent clients from modifying the same
data simultaneously.
Writes to a single document occur either in full or not at all, and
clients always see consistent data.
MongoDB uses multi-granularity locking that allows operations to
lock at the global, database or collection level, and allows for
individual storage engines to implement their own concurrency
control below the collection level
Multiple Granularity means hierarchically breaking up the database
into blocks that can be locked and can be tracked what needs to
lock and in what fashion. Such a hierarchy can be represented
graphically as a tree.
MongoDB uses reader-writer locks that allow concurrent readers
shared access to a resource, such as a database or collection. 98
• For example, when locking a collection for writing (using
mode X), both the corresponding database lock and the
global lock must be locked in intent exclusive (IX) mode.
• A single database can simultaneously be locked in IS
and IX mode, but an exclusive (X) lock cannot coexist
with any other modes, and a shared (S) lock can only
coexist with intent shared (IS) locks.
• Locks are fair, with lock requests for reads and writes
queued in order.
• However, to optimize throughput, when one lock
request is granted, all other compatible lock requests
are granted at the same time, potentially releasing the
locks before a conflicting lock request is performed.
99
The lock modes are represented as follows
100
MongoDB vs. Redis
101
MongoDB vs. CouchDB
• https://www.mongodb.com/resources/compare/documentdb-vs-
mongodb/couchdb-vs-mongodb
102
Security considerations when configuring authentication and authorization in a MongoDB deployment
When configuring authentication and authorization in a MongoDB deployment, we can consider the
following security considerations:
Authentication mechanism
MongoDB supports a number of authentication mechanisms, including the default salted challenge response
authentication mechanism (SCRAM) and X.509 certificate authentication.
Access control
Enable access control and create separate security credentials for each user or process that needs to access
MongoDB. You can also use role-based access control to associate authorizations with roles, such as
database administrator, developer, or application server.
Remote connections
Secure remote connections with a VPN or SSH tunneling to encrypt and authenticate communication. You
can also consider implementing two-factor authentication (2FA) for an extra layer of protection.
Network environment
Run MongoDB in a trusted network environment and control inbound and outbound traffic with a firewall or
security groups. Only allow trusted clients to access the network interfaces and ports where MongoDB
instances are available.
Encrypting traffic
Encrypt traffic to prevent sensitive data from being captured on the network. MongoDB supports X.509
certificate authentication for use with a secure TLS/SSL connection.
Auditing and logs
Audit trails should track who made changes to the database configuration, what the changes were, and
when they were made
103
Assignments
• Write a script in Node.js to read data from a CSV file, transform it
into MongoDB document format, and then performs a bulk insert.
• Write a script with a filter query to find all documents in a collection
named "Employees" with an salary less than 10000 and a status of
"active".
104