Performing JOIN operations in NoSQL

NoSQL is a modern database management system that uses non-relational models to store and retrieve data. It is particularly useful for handling massive amounts of unstructured or semi-structured data that cannot be neatly organized into traditional rows and tables.

Joins in NoSQL

Joins in NoSQL databases work differently from those in SQL databases. This is because NoSQL databases use non-relational data models, which do not have predefined relationships between tables.

In NoSQL databases, the data is often denormalized, meaning that it is duplicated across multiple collections or documents to improve query performance. Instead of using joins, NoSQL databases often use techniques such as embedding and referencing to combine data from multiple documents or collections.

Here are two common techniques used in NoSQL databases to combine data from multiple collections:

Embedded documents

In this technique, one collection includes documents from another collection as embedded documents. For example, a blog post collection may include comments as embedded documents. When you query for a blog post, you can also retrieve the comments associated with that post without performing a join operation.

Referencing documents

In this technique, one collection includes references to documents in another collection. For example, a blog post collection may include references to user profiles in a separate user collection. When you query for a blog post, you can also retrieve the user profile associated with the author of that post by using the reference in the blog post document.

The lookup technique

Lookup is used in combination with other NoSQL database techniques such as embedding and referencing to perform joins. For example, we can use embedding to include related data as sub-documents within a single document, and then use lookup to retrieve additional data from another collection or document based on the values stored in the embedded sub-documents.

Example

Let’s start with an example in MongoDB shell version v3.6.8.

Suppose we have two collections with the following records in MongoDB:

Customers

_id

name

email

1

John Doe

johndoe@gmail.com

2

Jane Smith

janesmith@gmail.com

3

Bob Johnson

bobjohnson@gmail.com

Products

_id

name

price

customer_id

1

T-shirts

10.99

1

2

Sneakers

29.99

1

3

Jeans

19.99

2

4

Hoodie

15.99

3

5

Hat

9.99

2

6

Jacket

39.99

1

7

Sweatpants

24.99

3

Now, we have to retrieve the customer names along with the products they bought. We use the lookup operator to perform this task in the following widget:

use mydatabase
db.customers.find()
db.products.find()
db.customers.aggregate([
  {
    $lookup: {
      from: "products",
      localField: "_id",
      foreignField: "customer_id",
      as: "products"
    }
  },
  {
    $project: {
      _id: 0,
      name: 1,
      products: "$products.name"
    }
  }
])
The use of lookup operator in MongoDB

In the above example , we use the lookup operator to perform a join-like operation between the customers and products collections and then project a subset of the results.

Here is a breakdown of the different parts of the query:

  • use mydatabase: This selects the mydatabase database in MongoDB.

  • db.customers.aggregate([]): This starts a MongoDB aggregation pipeline on the customers collection.

  • $lookup: This is an aggregation pipeline stage that performs the join-like operation. It takes the following parameters:

    • from: "products": This specifies the name of the collection to join with.
    • localField: "_id": This specifies the name of the field in the customers collection to use for the join.
  • foreignField: "customer_id": This specifies the name of the field in the products collection to use for the join.

  • as: "products": This specifies the name of the field to store the joined data.

  • $project: This is an aggregation pipeline stage that selects a subset of the fields to return in the final output. It takes the following parameters:

    • _id: 0: This excludes the _id field from the output.
    • name: 1: This includes the name field from the customers collection in the output.
    • products: "$products.name": This includes the name field from the products collection in the output and renames it to products.

The final output of this query is a list of documents that contain the name field from the customers collection and a list of products that are associated with each customer, based on the join operation performed by the $lookup stage.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved