NoSQL is a modern database management system that uses non-relational models to store and retrieve data. It is particularly useful for handling massive amounts of unstructured or semi-structured data that cannot be neatly organized into traditional rows and tables.
Joins in NoSQL databases work differently from those in SQL databases. This is because NoSQL databases use non-relational data models, which do not have predefined relationships between tables.
In NoSQL databases, the data is often denormalized, meaning that it is duplicated across multiple collections or documents to improve query performance. Instead of using joins, NoSQL databases often use techniques such as embedding and referencing to combine data from multiple documents or collections.
Here are two common techniques used in NoSQL databases to combine data from multiple collections:
In this technique, one collection includes documents from another collection as embedded documents. For example, a blog post collection may include comments as embedded documents. When you query for a blog post, you can also retrieve the comments associated with that post without performing a join operation.
In this technique, one collection includes references to documents in another collection. For example, a blog post collection may include references to user profiles in a separate user collection. When you query for a blog post, you can also retrieve the user profile associated with the author of that post by using the reference in the blog post document.
Lookup is used in combination with other NoSQL database techniques such as embedding and referencing to perform joins. For example, we can use embedding to include related data as sub-documents within a single document, and then use lookup
to retrieve additional data from another collection or document based on the values stored in the embedded sub-documents.
Let’s start with an example in MongoDB shell version v3.6.8
.
Suppose we have two collections with the following records in MongoDB:
|
|
|
1 | John Doe | johndoe@gmail.com |
2 | Jane Smith | janesmith@gmail.com |
3 | Bob Johnson | bobjohnson@gmail.com |
|
|
|
|
1 | T-shirts | 10.99 | 1 |
2 | Sneakers | 29.99 | 1 |
3 | Jeans | 19.99 | 2 |
4 | Hoodie | 15.99 | 3 |
5 | Hat | 9.99 | 2 |
6 | Jacket | 39.99 | 1 |
7 | Sweatpants | 24.99 | 3 |
Now, we have to retrieve the customer names along with the products they bought. We use the lookup
operator to perform this task in the following widget:
use mydatabase db.customers.find() db.products.find() db.customers.aggregate([ { $lookup: { from: "products", localField: "_id", foreignField: "customer_id", as: "products" } }, { $project: { _id: 0, name: 1, products: "$products.name" } } ])
In the above example , we use the lookup
operator to perform a join-like operation between the customers
and products
collections and then project a subset of the results.
Here is a breakdown of the different parts of the query:
use mydatabase
: This selects the mydatabase
database in MongoDB.
db.customers.aggregate([])
: This starts a MongoDB aggregation pipeline on the customers
collection.
$lookup
: This is an aggregation pipeline stage that performs the join-like operation. It takes the following parameters:
from: "products"
: This specifies the name of the collection to join with.localField: "_id"
: This specifies the name of the field in the customers
collection to use for the join.foreignField: "customer_id"
: This specifies the name of the field in the products
collection to use for the join.
as: "products"
: This specifies the name of the field to store the joined data.
$project
: This is an aggregation pipeline stage that selects a subset of the fields to return in the final output. It takes the following parameters:
_id: 0
: This excludes the _id
field from the output.name: 1
: This includes the name
field from the customers
collection in the output.products: "$products.name"
: This includes the name
field from the products
collection in the output and renames it to products
.The final output of this query is a list of documents that contain the name field from the customers
collection and a list of products
that are associated with each customer, based on the join operation performed by the $lookup
stage.
Free Resources