How to use Aggregation Pipeline in MongoDB
Aggregation Pipeline is the recommended way to run complex queries in MongoDB. If you are using MongoDB's MapReduce, it is best to switch to Aggregation Pipeline for more efficient computation.
What is Aggregation Pipeline in MongoDB?
Aggregation Pipeline is a multi-phase process that runs advanced queries in MongoDB. It processes data through different stages called pipelines. You can use the results generated from a level as a working sample
For example, you can pass the results of a match operation to another stage to sort them until you get the desired result.
Each stage of an Aggregation Pipeline includes a MongoDB operator and creates one or more transformed documents. Depending on your query, a level may appear multiple times in the process. For example, you may need to use the $count or $sort operator stages multiple times in the aggregation process.
Stages of Aggregation Pipeline
Aggregation Pipeline passes data through multiple stages in a single query. You can find details about some of the document arbitration stages in MongoDB.
Below are some of the most common stages.
$match . stage
This stage helps you define specific filtration conditions before starting the other synthesis stages. You can use it to select the appropriate data that you want to include in the aggregation process.
$group . stage
The grouping phase separates data into different groups based on specific criteria using key-value pairs. Each group represents a key in the output document.
For example, consider the following sales sample data:
Using an aggregation pipeline, you can calculate total sales and peak sales for each product group:
{ $group: { _id: $Section, total_sales_count: {$sum : $Sold}, top_sales: {$max: $Amount}, } }
The _id:$Section pair groups the output document based on sections. By specifying the top_sale_count and top_sale fields , MongoDB generates new keys based on the activity determined by the aggregator; this can be $sum, $min, $max or $avg.
$skip . stage
You can use the $skip stage to skip a specified number of documents in the output. It usually takes place after the group phase. For example, if you expect two output documents but ignore one, aggregation will only output the second document.
To add a skip stage, insert the $skip operator into the aggregation pipeline:
., { $skip: 1 },
$sort stage
The sorting stage allows you to sort the data in descending or ascending order. For example, sort the data in the previous query example in descending order to decide which section has the highest sales.
Add the $sort operator to the previous query:
., { $sort: {top_sales: -1} },
$limit . stage
The limit operator reduces the number of output documents you want the Aggregation Pipeline to display. For example, use the $limit operator to get the highest revenue portion returned by the previous period:
., { $sort: {top_sales: -1} }, {"$limit": 1}
The results return only the first document, which has the highest sales volume because it appears at the top of the categorized results.
$project . stage
The $project phase allows you to shape the resulting document as desired. Using the $project operator, you can specify the field to include in the result and customize its key name.
For example, a sample output without the $project phase looks like this:
Let's see how it looks when combined with the $project stage . To add $project to the pipeline:
., { "$project": { "_id": 0, "Section": "$_id", "TotalSold": "$total_sales_count", "TopSale": "$top_sales", } }
Since we ungrouped the data based on product parts, the data above includes each product part in the output document. It also ensures that the aggregated sales numbers and top sales features in the output are TotalSold and TopSale .
The end result is much more compact than the previous version:
How to create Aggregation Pipeline in MongoDB
Although the aggregation process includes several operations, the previously highlighted stages give you an idea of how to apply them in the process, including the basic query for each operation.
Using the previous sales data sample, let's synthesize some of the stages discussed above to better understand the Aggregation Pipeline in MongoDB:
db.sales.aggregate([ { "$match": { "Sold": { "$gte": 5 } } }, { "$group": { "_id": "$Section", "total_sales_count": { "$sum": "$Sold" }, "top_sales": { "$max": "$Amount" }, } }, { "$sort": { "top_sales": -1 } }, {"$skip": 0}, { "$project": { "_id": 0, "Section": "$_id", "TotalSold": "$total_sales_count", "TopSale": "$top_sales", } } ])
Result:
Above is how to use Aggregation Pipeline in MongoDB . Hope the article is useful to you.
You should read it
- MongoDB malicious code attacks more than 26,000 victims in a week
- Learn about security features and authentication in MongoDB
- Advantages of MongoDB
- Instructions on 2 ways to install MongoDB on Raspberry Pi
- Reference Database in MongoDB
- Index (Mong) in MongoDB
- Data type in MongoDB
- How to Build a GraphQL API with Apollo Server and MongoDB
May be interested
- Text Search in MongoDBstarting with version 2.4, mongodb started supporting text indexes to search within the string content.
- Shard in MongoDBsharding is a process of storing data records across multiple devices and it is a method of mongodb to meet the requirement for increasing data. when the size of the data increases, a single device cannot be enough to store data.
- Learn about Java Driver in MongoDBin the following article, we will introduce you some basic features of mongodv java driver as well as how to deploy and apply in practice.
- Install MongoDBinstructions for installing mongodb on windows.
- Map Reduce in MongoDBin mongodb documentation, map-reduce is a data processing system that condenses a large amount of data into useful overall results. mongodb uses mapreduce command for map-reduce operation. in general, map reduce is used to handle large data sets.
- Data modeling in MongoDBdata in mongodb has a flexible schema. documents in the same collection need not have the same set of fields or structures, and common fields in collection documents can keep different data types.
- GridFS in MongoDBgridfs is the mongodb specification for storing and collecting large files such as images, audio, video files, etc. it is a type of file system to store files but its data is stored inside mongodb collections. .
- Query analysis in MongoDBanalyzing queries is a very important aspect to assess the effectiveness of database and the effectiveness of the designed index. we will explore the two most frequently used queries, $ explain and $ hint.
- Projection in MongoDBin mongodb, projection's meaning is to select only the necessary data instead of selecting the entire data of a document. if a document has 5 fields and you only need 3 fields, you should only select 3 fields from that document.
- Reference Database in MongoDBas shown in the relationship chapter in mongodb, to deploy a standardized database structure in mongodb, we use the referenced relationship concept, also known as manual references, in which we manipulate to store the id of the documents referenced in another document.