MongoDB Map-Reduce
MongoDB Map-Reduce is a data processing model that facilitates operations on large data sets to produce aggregated results. It uses the mapReduce() function comprising map and reduce functions to handle complex data transformations.
In this article, We will learn about MongoDB Map-Reduce by understanding various examples, When to use and How to use Map Reduce in MongoDB in detail.
What MongoDB Map-Reduce
MongoDB Map-Reduce is a data processing programming model that helps to perform operations on large data sets and produce aggregated results. MongoDB provides the mapReduce() function to perform the map-reduce operations. This function has two main functions, i.e., map function and reduce function.
The map function is used to group all the data based on the key-value and the reduce function is used to perform operations on the mapped data. So, the data is independently mapped and reduced in different spaces and then combined in the function and the result will be saved to the specified new collection.
This mapReduce() function generally operates on large data sets. Using Map Reduce we can perform aggregation operations such as max and avg on the data using some key and it is similar to GroupBy in SQL. Map-Reduce is a powerful feature in MongoDB for aggregating data.
Syntax
db.collectionName.mapReduce(… map(),…reduce(),…query{},…output{});
Key Terms:
- map() function: It uses the emit() function in which it takes two parameters key and value key. Here the key is on which we make groups like Group By in MySQL.
- reduce() function: It is the step in which we perform our aggregate functions like avg(), and sum().
- query: we will pass the query to filter the resultset.
- output: we will specify the collection name where the result will be stored.
Steps to use Map Reduce in MongoDB
Look at this step-by-step guide to learn how to use MongoDB Map-Reduce. Let’s try to understand the mapReduce() using the following example. In this example, we have five records from which we need to take out the maximum marks of each section and the keys are id, sec, marks.
{"id":1, "sec":A, "marks":80}
{"id":2, "sec":A, "marks":90}
{"id":1, "sec":B, "marks":99}
{"id":1, "sec":B, "marks":95}
{"id":1, "sec":C, "marks":90}
Here we need to find the maximum marks in each section. So, our key by which we will group documents is the sec key and the value will be marks. Inside the map function, we use emit(this.sec, this.marks) function, and we will return the sec and marks of each record(document) from the emit function. This is similar to group By MySQL.
var map = function(){emit(this.sec, this.marks)};
After iterating over each document Emit function will give back the data like this:
{"A":[80, 90]}, {"B":[99, 90]}, {"C":[90] }
and upto this point it is what map() function does. The data given by emit function is grouped by sec key, Now this data will be input to our reduce function. Reduce function is where actual aggregation of data takes place. In our example we will pick the Max of each section like for sec A:[80, 90] = 90 (Max) B:[99, 90] = 99 (max) , C:[90] = 90(max).
var reduce = function(sec,marks){return Array.max(marks);};
Here in reduce() function, we have reduced the records now we will output them into a new collection.{out :”collectionName”}
db.collectionName.mapReduce(map,reduce,{out :"collectionName"});
In the above query we have already defined the map, reduce. Then for checking we need to look into the newly created collection we can use the query db.collectionName.find() we get:
{"id":"A", value:90}
{"id":"B", value:99}
{"id":"C", value:90}
Examples of MongoDB Map Reduce
Let’s say we have an employee collection and need to find the sum of ranks grouped by age. The collection contains employee details like age
and rank
- Database: geeksforgeeks2
- Collection: employee
- Documents: Six documents that contains the details of the employees
Example 1: Find the Sum of Ranks Grouped by Ages
Here, we will calculate the sum of rank present inside the particular age group. Now age is our key on which we will perform group by (like in MySQL) and rank will be the key on which we will perform sum aggregation.
Query:
var map=function(){ emit(this.age,this.rank)};
var reduce=function(age,rank){ return Array.sum(rank);};
db.employee.mapReduce(map,reduce,{out :"resultCollection1"});
Output:
Explanation:
- Inside map() function, i.e., map() : function map(){ emit(this.age,this.rank);}; we will write the emit(this.age,this.rank) function. Here this represents the current collection being iterated and the first key is age using age we will group the result like having age 24 give the sum of all rank or having age 25 give the sum of all rank and the second argument is rank on which aggregation will be performed.
- Inside the reduce function, i.e., reduce(): function reduce(key,rank){ return Array.sum(rank); }; we will perform the aggregation function.
- Now the third parameter will be output where we will define the collection where the result will be saved, i.e., {out :”resultCollection1″}. Here, out represents the key whose value is the collection name where the result will be saved.
Example 2: Performing avg() Aggregation on Rank Grouped by Ages
In this example, we will calculate the average of the ranks grouped by age.
Query:
var map=function(){ emit(this.age,this.rank)};
var reduce=function(age,rank){ return Array.avg(rank);};
db.employee.mapReduce(map,reduce,{out :"resultCollection3"});
db.resultCollection3.find()
Output:
Explanation:
- map(): Function map(){ emit(this.age, this.rank)};. Here age is the key by which we will group and rank is the key on which avg() aggregation will be performed.
- reduce(): Function reduce (age,rank){ return Array.avg(rank)l};
- output: {out:”resultCollection3″}
When to use Map-Reduce in MongoDB?
- Large Datasets: The aggregation query is slow due to the volume of data. Map-Reduce can process large datasets more efficiently.
- Complex Aggregations: If your aggregation operations are complex or not easily achievable using MongoDB’s aggregation pipeline.
- Custom Data Transformations: We need the flexibility of custom JavaScript functions to process and aggregate the data.
- Performance Optimization: Map-Reduce can be faster for some types of data aggregation than the aggregation pipeline, especially for operations like summing, averaging, or finding maximum values over large datasets.
When Not to Use MongoDB Map-Reduce
While MongoDB Map-Reduce is powerful, it is not always the best option. Avoid using it when:
- Simple Aggregation Operations: MongoDB’s aggregation pipeline is generally more efficient for simple aggregation tasks like sum, count, and average.
- Performance Concerns: Map-Reduce involves more overhead and can be slower than the aggregation pipeline for simpler queries.
Conclusion
MongoDB Map-Reduce is a powerful tool for aggregating and processing large datasets. By utilizing custom JavaScript functions, we can perform complex transformations and calculations on our data. While the aggregation pipeline is efficient for most operations, Map-Reduce offers greater flexibility for custom data processing tasks. Use Map-Reduce when dealing with large datasets or when we need complex data transformations that MongoDB’s aggregation pipeline cannot efficiently handle.
FAQs
What is MongoDB MapReduce?
MongoDB MapReduce is a data processing technique for large data sets that involves two phases: the map phase, where data is transformed into key-value pairs, and the reduce phase, where these pairs are aggregated. It is used for complex data aggregation operations that cannot be easily handled by MongoDB’s aggregation framework.
What is MapReduce functionality?
MapReduce functionality involves two main functions: the map function, which processes input data and emits key-value pairs, and the reduce function, which takes these key-value pairs and merges the values associated with each unique key. This paradigm allows for efficient parallel processing of large data sets.
What is an example of MapReduce?
An example of MapReduce is counting the number of occurrences of each word in a collection of documents. The map function emits each word as a key with a value of 1, and the reduce function sums up the values for each key, resulting in the total count of each word across all documents.