Examples of data modeling in MongoDB

Data modeling is the process of building a structure for a database system. The goal of data modeling is to ensure that data is organized in a way that is efficient, easy to maintain, and meets the application criteria.

MongoDB's document-oriented data model offers great flexibility compared to traditional relational databases, allowing for easier handling of semi-structured and unstructured data. However, modeling data requires designing the structure of the documents and collections in a MongoDB database. In the following sections, we will discuss the concept of document-based data modeling, the importance of denormalization, and the advantages of MongoDB's flexible schema.

Concept of document-based data modeling

The stored data in MongoDB is arranged in a hierarchical structure where the databases are at the top, then the collections within databases, and finally documents at the collection level. Documents represent an object and its attributes, allowing for easy adaptation to changes in data structure and application requirements. This approach also simplifies data access by eliminating the need for complex joins and foreign keys in relational databases. MongoDB's requires that a single document contain related data to improves query performance, making retrieval faster and more efficient.

Denormalization in MongoDB

Denormalization optimizes query performance by duplicating data across multiple documents or collections, reducing database reads, and avoiding complex joins. Embedding or referencing related data in a single document or across multiple documents achieves denormalization

Suppose you have two collections: users and posts, with each post document referencing a user who created it.

The user data can be denormalized by embedding it in the post document. This erases the need to perform joint operations across the two collections.

The advantage of MongoDB's flexible schema in data modeling

Inserting data in a relational database requires us to declare a table schema. However, MongoDB does not require its documents to have the same schema. Each document can have a unique structure, allowing flexibility and adaptability in the model

Designing the data model

To optimally perform, scale, maintain, and avoid unnecessary complexity, careful consideration of application requirements, use cases, and performance goals is essential in modeling data in MongoDB.

There are key steps to be considered when designing a data model in MongoDB.

  1. Identify entities and relationships: An entity is an object that is independent of another. In databases, a relationship describes a situation between two tables, where one table has a foreign key that references the primary key of the other table.

  2. Normalize or denormalize data: Next, we decide if the data will be normalized or denormalized. Normalization involves breaking down data into several manageable documents, while denormalization, as discussed above, involves packing related data into a single document.

  3. Define the data model: This step requires us to create a collection or entity and define the fields for the documents. It is important to give these fields descriptive names.

  4. Optimize queries: To ensure good performance of the model, we need to consider the type of queries that will run against the data. One way of doing this is to denormalize the data.

  5. Optimize for write operations: To ensure that the application we are building can handle increased traffic and data volumes, it is important to optimize write queries such as creating, updating, and deleting documents.

  6. Test and Refine: This is the final step. Here, we need to test the model against a sample dataset and refine it where necessary.

Embedding and referencing

Two techniques in MongoDB can be used to model data. These techniques have their differences, and it is advisable we study our use case well for a better flow. The two techniques are as follows:

  1. Embedded/Denormalized data model

  2. Referenced/Normalized data model

Embedded data model

This technique allows us to keep related data in a single document. The related data can be stored in an array, object, or field, which can be inserted into a single document. It is a great technique to use if we are prioritizing query performance, but it can be more complicated due to data duplication.

Suppose we have a blog post document that looks like this:

{
_id: ObjectId("613a2d6d2e6f373a48967e50"),
title: "My First Blog Post",
content: "Lorem ipsum dolor sit amet...",
author: "John Doe",
comments: [
{
username: "Jane Smith",
comment: "Great post!",
date: ISODate("2023-04-20T14:30:00.000Z")
},
{
username: "Bob Johnson",
comment: "Thanks for sharing!",
date: ISODate("2023-04-21T10:00:00.000Z")
}
]
}

In the example above, the comments field, which is an array, is embedded into the blog post document.

All comments can be retrieved faster by using the blog post id.

db.blogposts.findOne({ _id: ObjectId("613a2d6d2e6f373a48967e50") }, { comments: 1 })

Referenced data model

This technique involves referencing related data stored in different documents. Usually, a unique id is used as an element of referencing. It is useful for large datasets that can't be embedded into one another, and it also results in less duplication of data. One of its disadvantages is that the number of queries to retrieve data might be increased.

Suppose we have two collections, blogposts, and comments. The blogpost collection contains documents representing posts and the comments collection contains documents representing comments on these posts.

// blogpost document
{
_id: ObjectId("613a2d6d2e6f373a48967e50"),
title: "My First Blog Post",
content: "Lorem ipsum dolor sit amet...",
author: "John Doe",
comments: [
ObjectId("613a2f892e6f373a48967e51"),
ObjectId("613a2f892e6f373a48967e52")
]
}

In the code example above, the blogpost document contains an array of ObjectId that references the comments collection.

{
_id: ObjectId("613a2f892e6f373a48967e51"),
post_id: ObjectId("613a2d6d2e6f373a48967e50"),
username: "Jane Smith",
comment: "Great post!",
date: ISODate("2023-04-20T14:30:00.000Z")
}

In this example, the post_id field contains a reference to the _id field of the corresponding blog post in the blogposts collection.

To retrieve all the comments for a specific blog post, you can use the following MongoDB query:

db.comments.find({ post_id: ObjectId("613a2d6d2e6f373a48967e50") })

This query returns all the comments in the comments collection that have a post_id field equal to the _id field of the blog post we are interested in.

Defining relationships in MongoDB

Defining the relationship for your schema is a very important thing to consider. It defines how the data can be related, accessed, and manipulated. We will look at the three most common types of relationships in MongoDB.

  1. One-to-one relationship

  2. One-to-many relationship

  3. Many-to-many relationship

One-to-one relationship

This is a relationship that is established between two entities, whereby one entity is related exactly to one instance in the other entity. This can be represented either through the embedded or referenced data model technique.

{
"_id": ObjectId("61c5d320f5db23428aaf2677"),
"name": "John Doe",
"email": "johndoe@example.com",
"profile": {
"address": "123 Main St",
"city": "Anytown",
"state": "CA",
"zip": "12345"
}
}

The code example above shows an entity profile embedded in a user entity. Since one user can only have one exact profile, it can be said that there exists a one-to-one relationship between them.

One-to-many relationship

This relationship exists between two entities where one entity can be related to many instances of another entity. This relationship is mostly represented in a referenced data model.

// User document
{
"_id": ObjectId("61c5d320f5db23428aaf2677"),
"name": "John Doe"
}
// Order documents
{
"_id": ObjectId("613a2d6d2e6f373a48967e50"),
"user_id": ObjectId("61c5d320f5db23428aaf2677"),
"order_date": ISODate("2023-04-23T00:00:00.000Z"),
"total_amount": 100.00
},
{
"_id": ObjectId("321a2d6d2e6e373a489673b5"),
"user_id": ObjectId("61c5d320f5db23428aaf2677"),
"order_date": ISODate("2023-04-22T00:00:00.000Z"),
"total_amount": 50.00
}

The code example above shows that the user document has two related order documents. Each order document contains a reference to the user document through the user_id field. The _id field in each order document serves as a unique identifier for that document. Thus, it can be said that the user document has a one-to-many relationship with the order documents.

Many-to-many relationship

This is a relationship that exists between two entities, whereby each entity can be related to multiple instances of the other entity. Each document representing an entity often contains an array of references. Each reference points to related documents in the other collection.

// User documents
{
"_id": "user1",
"name": "John Doe",
"group_ids": ["group1", "group2"]
},
{
"_id": "user2",
"name": "Jane Smith",
"group_ids": ["group1", "group3"]
}
// Group documents
{
"_id": "group1",
"name": "Technology Enthusiasts",
"user_ids": ["user1", "user2"]
},
{
"_id": "group2",
"name": "Fitness Fanatics",
"user_ids": ["user1"]
},
{
"_id": "group3",
"name": "Foodies",
"user_ids": ["user2"]
}

The code example above shows that each user document has an array of group_ids that the user is associated with, and each group document has an array of user_ids that are associated with that group.

Conclusion

Data modeling is a critical aspect of building a database. In this Answer, we have discussed the best practices that will enable us to design a highly performant and scalable data model. We also discussed schema validation, which is a crucial tool to ensure data integrity. We also differentiated between embedding and referencing, making sure we chose the right one based on our application needs. With all these in mind, we will be on our way to building efficient data models.

Free Resources