Databases: Switching from relational to document models, Part 2
Reading Time: 7 minutes

This is a three-part blog series. Part one is located here, and part three can be found here.

In this part, we will discuss document databases, also known as document-oriented databases, which are a type of NoSQL database. It’s important to note that not all NoSQL databases are document-oriented, as there are different types, such as key-value or column-store databases.

Document Databases – Intro and Flexible Schema

Document databases were developed in the late 2000s to accommodate the new reality of systems. As the cost of hardware decreased, applications now need to scale to handle thousands or millions of users. Therefore, scaling is the focus, rather than keeping the same idea of normalization used in traditional relational databases.

Both databases were designed for the new reality of systems where the cost of hardware has significantly decreased over the years. Applications now need to scale out since we have many web applications serving thousands, even millions of people, and a single instance may not be able to handle all the data. Scaling is one of the  priorities for most document-oriented databases. If you have a huge amount of data, running a join or an aggregation may take a while. Therefore, document-oriented databases offer a flexible schema. When I say flexible schema, it does not mean “no schema”. You can enforce some kind of schema, and if you don’t want to, you can split your data based on subjects. For example, it’s better to have a collection or table for a customer, and you need to have different collections for different purposes.

The advantage of a flexible schema is that it allows developers speed up the development process. You don’t need to create all your tables or specify the whole system with hundreds of tables. You can start creating your objects and optimize them as you go. With the era of web and mobile applications, sometimes you need to develop and deploy software within a couple of months. Document databases save their document or object as JSON. Although not necessarily saved as JSON on disk, some databases can change that for compression purposes. However, it is going back and forth over the wire, and that’s a good thing because most recent languages can parse JSON easily. Instead of getting rows and parsing all the columns in a row and then creating a JSON response, you can have it easily with a document-oriented database.

blank

Suppose we are developing an e-commerce application. In that case, the flexible schema is helpful since the products have different properties, and creating a normal relational database may require hundreds of fields, most of which may be new since you don’t have the property. For instance, if I have two products, a laptop, which is electronic, has a brand, memory, and color, and a t-shirt, which is just a t-shirt, doesn’t have memory but has a size, both of these can live in the same table. In this example, these are totally valid documents in my database, and the first one doesn’t have the size field while the second one doesn’t have the memory field. The two versions of a flexible schema are that you can omit some fields, and you can have fields that are polymorphic. For example, you can have fields as an integer or a string depending on the product.

When receiving the result from your API, if you’re going to validate if the field size is there, or you’re going to map the keys and get the values of the keys, you don’t need to know the keys. You just need to parse them and get the key names and the key values, making it easier than going and checking if the field exists or not. Some databases offer a filter called schema validation where you can ensure that some fields must be present in your document.

Document Databases – Document Model and JSON Documents

Continuing to talk about the document model and the flexible schema, let’s explore a scenario where we want to create a customer entry in the document database. In this case, I am using the same field names as previously mentioned in the relational database – ID, full name, birthday, and created by. 

blank

However, in this example, I am saving the entire value of my created_by user as an embedded document. This example demonstrates the document model one of several ways to design the schema for a document-oriented database. It does not mean that it is the only correct way; it all depends on your data access pattern.

Querying for Joe would result in retrieving all the information without running a join or merging two different tables. Although this could lead to some data duplication, it is ok for a document-oriented database as sometimes we may duplicate data to gain performance.

The document schema will depend highly on your data access pattern. For instance, if your application always requires knowing who created the user, it would be better to save the user along with the customer as an embedded document. On the other hand, sometimes opting for a reference rather than repeating the data is better. This decision depends on your query pattern for a specific table.

We can have strings, numbers, booleans, arrays, or other objects as a value for fields in a document-oriented database. These fields are polymorphic, meaning you do not need to use the same value type for each field in the same collection. You can have one specific entry or document where the value of a field is an array, and for another document, it can be something different. However, it can be hard to parse such entries.

Let’s consider the scenario of how orders will look like in a document-oriented database. Instead of having five tables, we can have one single document containing all the information required. This document can include the order ID, customer name, customer ID, order details, product ID, product name, price, and total.

blank

There are two main ways to have data together in a document-oriented database. We can either embed the data or use references. We can embed data by adding it to our collection, as shown in the example of embedding customer data. Alternatively, we can use references, allowing us to link documents together. Still, we will only have access to the referenced data when we query it.

In conclusion, the schema for a document-oriented database depends heavily on your data access pattern. It offers flexibility when it comes to a variety of value types, and we can have data together using either embedding or references.

Document Databases – Summarising

Embedded documents can improve query performance by providing all the required information in a single request and response, eliminating the need for joins or lookups. This makes the document human-readable and simplifies the process, as you can easily understand the structure and content of your order without going back and forth between different tables. However, this approach can result in data duplication.

On the other hand, references can help avoid duplication, especially when dealing with large documents. Using references can save document space and prevent the need for data normalization, but it may also introduce complexity and reduce human-readability. Furthermore, query performance can be slower in document-oriented databases compared to relational databases if using lookups, which have more advanced join algorithms.

When working with embedded documents, it’s crucial to be cautious when the embedded document is an array  as  document can grow indefinitely. For instance, if you’re developing a car tracking application that updates the car’s position every second, embedding this data in the car document could lead to the document reaching its maximum size (e.g., 16MB for MongoDB), which would result in poor query performance or unexpected behaviors.

To optimize performance, avoid using multiple references in a document-oriented database, as this could lead to a relational database-like structure. Keep in mind that document-oriented databases do not perform joins in the same way as relational databases.

When it comes to querying document-oriented databases, each database uses its own query language. For example, MongoDB uses the MongoDB Query Language (MQL), which is JavaScript-based, while other databases like Couchbase Server and RavenDB use their own query languages. Similarly, Cassandra has its query language based on the SQL standard.

In conclusion, understanding the differences and best practices for working with document-oriented databases is essential when transitioning from a relational database or developing a new application.

This is a three-part blog series. Part one is located here, and part three can be found here.

0 Comments

About the author

blank

Adamo Tonete

MongoDB SME

With decade of experience in NoSQL Databases, Adamo have worked for different companies such as Percona, MongoDB and PingCAP. Currently Adamo works as a MongoDB SME for an Irish company, maintening a 24×7 system.

Related posts

Subscribe to Updates

Privacy Policy