Skip to main content

Learn MongoDB Aggregation with real world example

In this article, we will see what is aggregation in mongodb and how to build mongodb aggregationpipelines.Learn MongoDB Aggregation with real world example. I assume that you have some experience in MongoDB.

When you start using mongodb in the beginning stage. you often write queries in mongodb just to do CRUD(Create Read Update and Delete) operations.

But, when an application gets more complex, you may need to perform several operations on the data before sending it as a response.

For example, Consider that you are building an Analytical Dashboard where you need to show Tasks for each user. Here, server should send the all tasks for each user in an array.

Can you guess how to achieve this in our data?.. Here comes the role of Mongodb Aggregation.

Moreover, This is one of the simple example, you may face lot more complex than this. Let's see how mongodb solves this problem using aggregation pipeline.

MongoDB Aggregation Pipeline

Firstly, Mongodb aggregation is a pipeline which process the data on each pipeline stage. Each stage returns the output which turns into the input for next pipeline in the stage.

monogdb aggregator

Here, Data is passed in the each pipeline which filters,group and sort the data and returns the result.

Let's see the Aggregation Operator that are widely used in the real world applications.

Here's an example of how we are going to build mongodb aggregation pipeline,

1pipeline = [
2 { $match : {},
3 { $group : {},
4 { $sort : {},
5 ...
6]
7db.collectionName.aggregate(pipeline, options)

Before, going further. To practice the aggregation along with the article. you can import the sample dataset from this site.

Once, you download the data. you can import the dataset to mongodb using the mongoimport command.

import

1mongoimport --db aggrsample --collection test --file sample.json --jsonArray

\$match

match operator is similar to find() operator in mongoDB except that match works with aggregation. Likewise, match pipeline operator matches all the documents that satisfies the condition.

On the above dataset, let's try to match all the documents which has MA as a state.

1db.test
2 .aggregate([
3 {
4 $match: {
5 state: "MA",
6 },
7 },
8 ])
9 .pretty()

As a result, it will find all the document with MA as a state. output will be like,

match query

\$group

As the name suggests, it groups the documents based on the particular field. it can be id or any other fields.

On the top of match command, let's us group the documents by city.

1db.test.aggregate([
2 {
3 $match: {
4 state: "MA",
5 },
6 },
7 {
8 $group: {
9 _id: "$city",
10 },
11 },
12])

Further, you can see the result like,

only group

but wait, we want to retrieve all the fields in the document. why does it only return grouping field?.

Well, there is a reason for it. Once we group the document with _id field( _id field can contain any grouping field).

Then, we need to provide accumulator expression to retrieve the result of grouping pipeline. popular accumulator expressions are

  • \$first - this expression will return the first document of grouping result
  • \$push - it will push all the documents into an array based on grouping result
  • \$max - returns the highest value from the grouping documents.
  • \$sum - returns the sum(numerial value) of the grouping documents.

Here, we will see how to use pushexpression along with group

1db.test
2 .aggregate([
3 {
4 $match: {
5 state: "MA",
6 },
7 },
8 {
9 $group: {
10 _id: "$city",
11 data: {
12 $push: "$$ROOT",
13 },
14 },
15 },
16 ])
17 .pretty()

So, it will return documents like,

group push

\$project

Sometimes, you may not need all the fields in the document. you can only retrieve specific fields in the document using project operator.

1db.test
2 .aggregate([
3 {
4 $match: {
5 state: "MA",
6 },
7 },
8 {
9 $group: {
10 _id: "$city",
11 data: {
12 $push: "$$ROOT",
13 },
14 },
15 },
16 {
17 $project: {
18 _id: 0,
19 "data.loc": 1,
20 },
21 },
22 ])
23 .pretty()

project command

moreover, you can specify 0 or 1 to retrieve the specific field. By default, _id will be retrieved. you can specify _id as 0 to avoid that.

\$sort

Above all ,sort operator basically sorts the document in either ascending or descending order.

1db.test.aggregate([
2 {
3 $match: {
4 state: "MA",
5 },
6 },
7 {
8 $sort: {
9 pop: 1,
10 },
11 },
12])

Mainly, it sorts the documents based on field pop with ascending order(1 if it is ascending, -1 if it is descending).

sort command

\$limit

limit operator limits the number of documents retrieved from the database.

1db.test.aggregate([
2 {
3 $match: {
4 state: "MA",
5 },
6 },
7 {
8 $sort: {
9 pop: 1,
10 },
11 },
12 {
13 $limit: 5,
14 },
15])

So, above query will returns only five documents from the database

limit command

\$addFields

Meanwhile, sometime you need to create a custom field which can contain data that are aggregated. you can achieve this using \$addField operator.

1db.test.aggregate([
2 {
3 $match: {
4 state: "MA",
5 },
6 },
7 {
8 $addFields: {
9 stateAlias: "MAS",
10 },
11 },
12])

As a result, it will return documents such as

addField

\$lookup

After that, lookup is one of the popular aggregation operators in mongodb. if you are from SQL background. you can relate this with JOIN Query in RDBMS.

1db.universities
2 .aggregate([
3 { $match: { name: "USAL" } },
4 { $project: { _id: 0, name: 1 } },
5 {
6 $lookup: {
7 from: "courses",
8 localField: "name",
9 foreignField: "university",
10 as: "courses",
11 },
12 },
13 ])
14 .pretty()
  • from - it takes the collection that it wants to perform the join.
  • localField - it specifies the field from input document. Here, it takes the field name from the universities collection.
  • foreignField - it specifies the field from collection that it performs the join.Here, it is university field from coursescollection.
  • as - it specifies the alias for the field name.

Summary

To sum up, these are all the most common and popular aggregation operators in MongoDB. we will see in depth concepts of mongoDB aggregation operators in upcoming articles.

Comments

Popular posts from this blog

4 Ways to Communicate Across Browser Tabs in Realtime

1. Local Storage Events You might have already used LocalStorage, which is accessible across Tabs within the same application origin. But do you know that it also supports events? You can use this feature to communicate across Browser Tabs, where other Tabs will receive the event once the storage is updated. For example, let’s say in one Tab, we execute the following JavaScript code. window.localStorage.setItem("loggedIn", "true"); The other Tabs which listen to the event will receive it, as shown below. window.addEventListener('storage', (event) => { if (event.storageArea != localStorage) return; if (event.key === 'loggedIn') { // Do something with event.newValue } }); 2. Broadcast Channel API The Broadcast Channel API allows communication between Tabs, Windows, Frames, Iframes, and  Web Workers . One Tab can create and post to a channel as follows. const channel = new BroadcastChannel('app-data'); channel.postMessage(data); And oth...

Certbot SSL configuration in ubuntu

  Introduction Let’s Encrypt is a Certificate Authority (CA) that provides an easy way to obtain and install free  TLS/SSL certificates , thereby enabling encrypted HTTPS on web servers. It simplifies the process by providing a software client, Certbot, that attempts to automate most (if not all) of the required steps. Currently, the entire process of obtaining and installing a certificate is fully automated on both Apache and Nginx. In this tutorial, you will use Certbot to obtain a free SSL certificate for Apache on Ubuntu 18.04 and set up your certificate to renew automatically. This tutorial will use a separate Apache virtual host file instead of the default configuration file.  We recommend  creating new Apache virtual host files for each domain because it helps to avoid common mistakes and maintains the default files as a fallback configuration. Prerequisites To follow this tutorial, you will need: One Ubuntu 18.04 server set up by following this  initial ...

Working with Node.js streams

  Introduction Streams are one of the major features that most Node.js applications rely on, especially when handling HTTP requests, reading/writing files, and making socket communications. Streams are very predictable since we can always expect data, error, and end events when using streams. This article will teach Node developers how to use streams to efficiently handle large amounts of data. This is a typical real-world challenge faced by Node developers when they have to deal with a large data source, and it may not be feasible to process this data all at once. This article will cover the following topics: Types of streams When to adopt Node.js streams Batching Composing streams in Node.js Transforming data with transform streams Piping streams Error handling Node.js streams Types of streams The following are four main types of streams in Node.js: Readable streams: The readable stream is responsible for reading data from a source file Writable streams: The writable stream is re...