How to use Aggregation Pipeline in MongoDB

Aggregation Pipeline is the recommended way to run complex queries in MongoDB. If you are using MongoDB's MapReduce, it is best to switch to Aggregation Pipeline for more efficient computation.

How to use Aggregation Pipeline in MongoDB Picture 1

What is Aggregation Pipeline in MongoDB?

Aggregation Pipeline is a multi-phase process that runs advanced queries in MongoDB. It processes data through different stages called pipelines. You can use the results generated from a level as a working sample

For example, you can pass the results of a match operation to another stage to sort them until you get the desired result.

Each stage of an Aggregation Pipeline includes a MongoDB operator and creates one or more transformed documents. Depending on your query, a level may appear multiple times in the process. For example, you may need to use the $count or $sort operator stages multiple times in the aggregation process.

How to use Aggregation Pipeline in MongoDB Picture 2

Stages of Aggregation Pipeline

Aggregation Pipeline passes data through multiple stages in a single query. You can find details about some of the document arbitration stages in MongoDB.

Below are some of the most common stages.

$match . stage

This stage helps you define specific filtration conditions before starting the other synthesis stages. You can use it to select the appropriate data that you want to include in the aggregation process.

$group . stage

The grouping phase separates data into different groups based on specific criteria using key-value pairs. Each group represents a key in the output document.

For example, consider the following sales sample data:

How to use Aggregation Pipeline in MongoDB Picture 3

Using an aggregation pipeline, you can calculate total sales and peak sales for each product group:

{ $group: { _id: $Section, total_sales_count: {$sum : $Sold}, top_sales: {$max: $Amount}, } }

The _id:$Section pair groups the output document based on sections. By specifying the top_sale_count and top_sale fields , MongoDB generates new keys based on the activity determined by the aggregator; this can be $sum, $min, $max or $avg.

$skip . stage

You can use the $skip stage to skip a specified number of documents in the output. It usually takes place after the group phase. For example, if you expect two output documents but ignore one, aggregation will only output the second document.

To add a skip stage, insert the $skip operator into the aggregation pipeline:

., { $skip: 1 },

$sort stage

The sorting stage allows you to sort the data in descending or ascending order. For example, sort the data in the previous query example in descending order to decide which section has the highest sales.

Add the $sort operator to the previous query:

., { $sort: {top_sales: -1} },

$limit . stage

The limit operator reduces the number of output documents you want the Aggregation Pipeline to display. For example, use the $limit operator to get the highest revenue portion returned by the previous period:

., { $sort: {top_sales: -1} }, {"$limit": 1}

The results return only the first document, which has the highest sales volume because it appears at the top of the categorized results.

$project . stage

The $project phase allows you to shape the resulting document as desired. Using the $project operator, you can specify the field to include in the result and customize its key name.

For example, a sample output without the $project phase looks like this:

How to use Aggregation Pipeline in MongoDB Picture 4

Let's see how it looks when combined with the $project stage . To add $project to the pipeline:

., { "$project": { "_id": 0, "Section": "$_id", "TotalSold": "$total_sales_count", "TopSale": "$top_sales", } }

Since we ungrouped the data based on product parts, the data above includes each product part in the output document. It also ensures that the aggregated sales numbers and top sales features in the output are TotalSold and TopSale .

The end result is much more compact than the previous version:

How to use Aggregation Pipeline in MongoDB Picture 5

How to create Aggregation Pipeline in MongoDB

Although the aggregation process includes several operations, the previously highlighted stages give you an idea of how to apply them in the process, including the basic query for each operation.

Using the previous sales data sample, let's synthesize some of the stages discussed above to better understand the Aggregation Pipeline in MongoDB:

db.sales.aggregate([ { "$match": { "Sold": { "$gte": 5 } } }, { "$group": { "_id": "$Section", "total_sales_count": { "$sum": "$Sold" }, "top_sales": { "$max": "$Amount" }, } }, { "$sort": { "top_sales": -1 } }, {"$skip": 0}, { "$project": { "_id": 0, "Section": "$_id", "TotalSold": "$total_sales_count", "TopSale": "$top_sales", } } ])

Result:

How to use Aggregation Pipeline in MongoDB Picture 6

Above is how to use Aggregation Pipeline in MongoDB . Hope the article is useful to you.

David Pac

Update 04 September 2023

You should read it

May be interested

Text Search in MongoDB
starting with version 2.4, mongodb started supporting text indexes to search within the string content.
Shard in MongoDB
sharding is a process of storing data records across multiple devices and it is a method of mongodb to meet the requirement for increasing data. when the size of the data increases, a single device cannot be enough to store data.
Learn about Java Driver in MongoDB
in the following article, we will introduce you some basic features of mongodv java driver as well as how to deploy and apply in practice.
Install MongoDB
instructions for installing mongodb on windows.
Map Reduce in MongoDB
in mongodb documentation, map-reduce is a data processing system that condenses a large amount of data into useful overall results. mongodb uses mapreduce command for map-reduce operation. in general, map reduce is used to handle large data sets.
Data modeling in MongoDB
data in mongodb has a flexible schema. documents in the same collection need not have the same set of fields or structures, and common fields in collection documents can keep different data types.
GridFS in MongoDB
gridfs is the mongodb specification for storing and collecting large files such as images, audio, video files, etc. it is a type of file system to store files but its data is stored inside mongodb collections. .
Query analysis in MongoDB
analyzing queries is a very important aspect to assess the effectiveness of database and the effectiveness of the designed index. we will explore the two most frequently used queries, $ explain and $ hint.
Projection in MongoDB
in mongodb, projection's meaning is to select only the necessary data instead of selecting the entire data of a document. if a document has 5 fields and you only need 3 fields, you should only select 3 fields from that document.
Reference Database in MongoDB
as shown in the relationship chapter in mongodb, to deploy a standardized database structure in mongodb, we use the referenced relationship concept, also known as manual references, in which we manipulate to store the id of the documents referenced in another document.