Document Database Community https://documentdatabase.org/ Thu, 15 Jun 2023 15:12:47 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 https://documentdatabase.org/wp-content/uploads/2023/02/COLORED-FAVICON-24-150x150.png Document Database Community https://documentdatabase.org/ 32 32 Comparing CosmosDB, DocumentDB, MongoDB, and FerretDB as Document Database for Your Stack https://documentdatabase.org/blog/comparing-cosmosdb-documentdb-mongodb-and-ferretdb-as-document-database-for-your-stack/?utm_source=rss&utm_medium=rss&utm_campaign=comparing-cosmosdb-documentdb-mongodb-and-ferretdb-as-document-database-for-your-stack Thu, 15 Jun 2023 15:12:43 +0000 https://documentdatabase.org/?p=810 A blog post by David Murphy (Udemy) about the document databases benefits and use cases for various technologies.

The post Comparing CosmosDB, DocumentDB, MongoDB, and FerretDB as Document Database for Your Stack first appeared on Document Database Community.

The post Comparing CosmosDB, DocumentDB, MongoDB, and FerretDB as Document Database for Your Stack appeared first on Document Database Community.

]]>
Comparing CosmosDB, DocumentDB, MongoDB, and FerretDB as Document Database for Your Stack
Reading Time: 15 minutes

The topic I’m discussing today is the document databases benefits and use cases for various technologies. It’s important to understand that no technology is inherently bad, but rather each one has its place depending on your specific needs. After exploring these potential use cases, I’ll compare each technology against each other to provide a more nuanced understanding of their strengths and weaknesses. Additionally, I’ll include benchmarks and estimated costs for different workloads, both in terms of hardware and operational expenses. Finally, I’ll offer some conclusions and recommendations before opening up the floor to questions and answers.

AWS DocumentDB – Benefits and Use Cases

DocumentDB is a fully managed NoSQL database solution by AWS. Over time, it has seen several versions, with recent updates improving user authentication, something I found particularly exciting.

DocumentDB is a highly available system with automatic failover and backup capabilities. It offers compatibility with MongoDB APIs and drivers, albeit with a slight lag compared to the latest versions.

 This solution is optimal for generic use cases that require strong query abilities, indexing capabilities on JSON data, ACID transactions at the document level, and a flexible schema design. It offers all the core features you look for in a document database.

 One of the key advantages of DocumentDB is its integration with AWS. If you’re already operating within the AWS ecosystem, it provides a simple, streamlined experience, easily integrating with IAM and fitting well into your existing control and tagging infrastructure for cost management.

 Overall, while there are no standout features that make it extraordinarily compelling, nor any significant downsides, DocumentDB shines for its simplicity and integration within the AWS framework.

MongoDB Atlas – Benefits and Use Case

MongoDB Atlas offers scalability and flexibility, with advantages in charting and high availability. It stays current with a wide variety of programming languages and frameworks. It also provides additional security features that might not be available elsewhere. However, whether you need these features depends on your individual security tolerances.

Atlas can be effectively used in various applications, thanks to its flexibility and low latency. However, it also comes with certain challenges, including potential cross-account bandwidth costs and data sovereignty concerns. Your security department will need to evaluate these considerations.

For instance, comparing AWS DocumentDB with Atlas briefly, in Atlas, MongoDB Inc. owns the storage containing your data. In contrast, with DocumentDB, while AWS manages the infrastructure, the data resides in your account and uses your KMS key. This difference means that from a data sovereignty perspective, you can assure your security department that with DocumentDB, the data remains within your sphere of control in the shared ownership model. This difference might be a critical factor in some cases. More on this later.

Azure CosmosDB – Benefits and Use Case

Moving on to Azure CosmosDB, for those unfamiliar with it, CosmosDB is more than just a typical setup. It’s a multi-engine database technology capable of flipping between a document model and several other distinct models. It supports global distribution and offers a fresh perspective on database architecture. It’s similar to technologies such as TiDB, which lean more towards distributed setups than even standard MongoDB does. In essence, it’s heavily clustered and designed for distributed use.

CosmosDB provides high availability, automatic failover, and automated backups, which align with our requirements. It also supports multiple APIs and programming languages. However, like DocumentDB, its APIs may not be as current as those provided by MongoDB Inc’s Atlas. If you aim to stay on the bleeding edge with the latest versions of MongoDB, this is a factor you’ll need to consider.

Despite this, CosmosDB has numerous advantages. It delivers low-latency performance, global scalability, and operates as a multi-model database. Although the multi-model aspect is not directly relevant to our current discussion, it’s an important feature to consider if you’re in the market for such a database. Technologies like this, or Tigris, are certainly worth exploring in those scenarios.

Ferret DB – Benefits and Use Case

Then finally I want to cover FerretDB. So FerretDB has a lot of benefits and use cases but the big differentiator here is while it has high availability and scalable it’s compatible with the drivers and tools, more compatible with drivers and tools compared to DocumentDB and CosmosDB. However, the release of FerretDB 1.0 could have possibly changed this and I would need to look into it further.

One important thing to note is that FerretDB is an open source database solution which means you can run it yourself. There are also managed Postgres systems available to assist in running FerretDB, but this is a topic for a separate discussion.

FerretDB’s use cases range from needing a flexible yet advanced database system to simplifying the stacks for a DBA team. It’s important to note that ferret is not a managed service like some of the other options we discussed earlier.

Overall, FerretDB is a highly available and scalable solution with numerous benefits and use cases that make it a strong contender for database management needs.

Pros &Cons Between Each Technology

AWS DocumentDB vs Azure CosmosDB – Comparison

In this comparison of AWS DocumentDB and Azure CosmosDB, we examine the pros and cons of each. Firstly, CosmosDB is more expensive than most other solutions, including DocumentDB, which is a highly cost-effective option. Both offer strong indexing capabilities and high availability, but CosmosDB can be limiting in terms of querying and indexing when compared to its competitors. This is particularly apparent when performing more complex operations such as aggregations.

Both platforms have flexible schema designs and are compatible with multiple APIs and drivers, making migration easy, however, CosmosDB is a thin shim layer on top of its core, which can lead to more frequent issues. While Azure is not necessarily a bad choice, the speaker has found AWS and Google Cloud to be more effective in their experience.

DocumentDB has slightly higher latency compared to some other tech, which can be managed by right-sizing the system, but there may be some underlying network differences between Azure and AWS that impact throughput. Overall, it’s important to analyze your specific use case to determine which platform is best suited to your needs.

AWS DocumentDB vs Ferret DB – Comparison

In this comparison, we will be discussing the differences between AWS DocumentDB and FerretDB.

Firstly, FerretDB is an open-source solution, so there are no licensing or management overhead fees involved, making it a cost-effective option. However, when it comes to long-term cost-effectiveness, AWS DocumentDB tends to be more manageable, especially if no automations are in place for deploying or managing FerretDB. With DocumentDB, the process is straightforward and manageable. Additionally, using Terraform providers and modules for DocumentDB made the process easy for us; authentication and encryption were completed in a matter of minutes.

Regarding schema design and feature sets, FerretDB has limitations, specifically when we talk about querying. With DocumentDB, some things are unavailable because older flavors do not run on 5.0 or 6.0. However, this shouldn’t be a major concern if we are looking at this from a credit-type perspective.

Regarding scalability, both are evenly matched in terms of what they can do, and it comes down to what infrastructure and tools are used to build out that Postgres server to get high availability. As of yet, there aren’t many tools available in the market, but we’re keeping our eyes open.

Regarding compatibility, both are evenly matched regarding 3.4 driver semantics. However, FerretDB is more compatible with more modern drivers than DocumentDB.

Lastly, DocumentDB is a self-managed solution, which makes it easy to build and tear it down using terraform for the developer community. However, managing FerretDB would require more effort from a DBA or dbre team, but it is manageable. In conclusion, while there are pros and cons for both solutions, it ultimately comes down to what is best suited for the organization’s needs and requirements.

Azure CosmosDB vs MongoDB Atlas – Comparison

When comparing Azure CosmosDB and MongoDB Atlas, there are a few differences to consider. For one, MongoDB Atlas is the newest offering and is quite expensive, but it comes with an Enterprise contract that could be beneficial for those with specific security needs. MongoDB Atlas is also likely to be the first to receive new features if they are for one of its core components. However, if we’re discussing something like FerretDB, it’s uncertain if they will always lag in security updates or new features.

Another difference to consider is the driver usage for MongoDB Atlas and FerretDB. If MongoDB Atlas drops a new feature without notice, it may take days, weeks, or even months for FerretDB to have the same feature available. The same can be said for DocumentDB. Azure CosmosDB and MongoDB Atlas, along with FerretDB, can all be globally distributed.

The sizing of MongoDB Atlas plans may not align with everyone’s needs. Additionally, while MongoDB Atlas is fully-managed and provides direct access to engineers, some have found the support team to be less than stellar in their responses to specific questions. It’s recommended that companies considering MongoDB Atlas look deeply at their support needs.

If comparing DocumentDB to Atlas, some have found AWS support to be more proactive in attempting to solve problems. MongoDB Atlas may put requests for feature improvements in a queue without any visible progress.

Overall, it’s important to consider specific needs and requirements when choosing between Azure CosmosDB and MongoDB Atlas.

Performance Benchmarks

I’m not certain why the operation wrapped around, but I used NYCBS data for my test, executing 10,000 requests per second. I averaged the results from multiple runs to achieve the numbers for these averages. Later, I’ll discuss cost elements and how I extrapolated them.

When it comes to CosmosDB benchmarks, I observed an average create time of about 2.2 milliseconds and a read time of 0.6 milliseconds. As shown on the screen, reads are roughly twice as fast as any write. Meanwhile, updates require slightly more overhead than a standard write, but deletes were slower. I didn’t quite understand why deletes were significantly slower than updates. I would have expected delete and create operations to be similar from an operations-per-second perspective.

These results are based on a burn-in run followed by three iterations lasting four hours each, which ensured that I accounted for any bursting or similar phenomena. I didn’t perform any specific tuning of the payload with YCSB, so I want to make it clear that these results shouldn’t be considered a finely-tuned CosmosDB, DocumentDB, or FerretDB benchmark. They reflect generic benchmarks with default settings.

As for Atlas benchmarks, CosmosDB is slightly faster in creating data, with an average of 1.8 milliseconds versus 2.2. Consequently, I experienced a bit more throughput with Atlas. The read speeds were comparable, but Atlas still managed to provide higher throughput. Interestingly, Atlas had consistent speeds of 1.8 milliseconds for all write-oriented requests, unlike CosmosDB which exhibited variable response times. This discrepancy could be attributed to some architectural differences, perhaps some kind of verification step in CosmosDB that slows down updates and deletes.

Regarding FerretDB, the create time during my tests averaged around three milliseconds. Read speeds were comparable to the other systems, ranging from 0.5 to 0.8 milliseconds. I observed that updates were rather costly, as were deletes. Perhaps in a subsequent conversation, we could discuss optimizing PostgreSQL and FerretDB to improve these speeds. Once again, these figures were derived from a default setup.

Estimated Costs

Cost Estimate Summary

This is a high-level summary, and you might notice that FerretDB seems exceptionally costly. However, it’s the only one where I included the cost of a part-time DBA working 20 hours per week. When I discuss FerretDB, I’ll specify exactly where this cost comes from.

In many cost estimates, they’ll state the cost of compute and storage, but seldom address the expenses related to managed services, especially when these are offered by third parties as opposed to being in-house. These estimates often overlook cross-company, cross-account, and cross-region costs from a bandwidth perspective. Hence, I’ve incorporated some estimates for these factors.

The calculations are based on an R5 large instance type with a gp3 type and 3000 IOPS. The payload size was set at one kilobyte for network considerations. This is the framework I used to model these costs.

DocumentDB Cost Estimates

In terms of DocumentDB, I ran these numbers on the US East-1 region, if I recall correctly, and used the DB R5 large instance type. The compute cost came out to be $373. To accommodate one terabyte of data, I added storage at an additional cost of $1,200, which considers the higher IOPS than standard IOPS and throughput. This cost could potentially be reduced by tuning, but for these estimates, I maintained this figure.

The networking cost, which includes cross-service traffic costs, is estimated at $250 per month. As for the DBA cost, I’ve listed it as zero. This doesn’t imply that there is no DBA cost, but since DocumentDB is a managed service, the initial investment in DBA time is minimized. Therefore, the total monthly cost for this DocumentDB example comes out to be $1,823.

MongoDB Atlas Cost Estimates

For Atlas, I used a similar setup with the M10 type of cluster and the same type of storage. Like in the previous example, the storage cost is fixed at $1,200. However, the compute cost for Atlas is slightly higher at $438 compared to $373 for DocumentDB. This increase could be attributed to the additional fee MongoDB charges for access to the Enterprise license and support.

What particularly concerned me with Atlas were the networking costs, primarily due to cross-account bandwidth charges. Since Atlas is not located in your account or your VPC, but in MongoDB’s account and their VPC, these cross-account charges for bandwidth apply. After factoring these in, I estimated the networking cost to be about $1,600 a month for one terabyte of storage and 10,000 requests per second.

Therefore, the total cost for this Atlas setup comes out to $3,238 per month, which is notably higher than that of DocumentDB.

CosmosDB Cost Estimates

With CosmosDB, the pattern is similar, but as you can see, the compute cost is significantly higher. As mentioned earlier, CosmosDB has a more complex cost model because it’s based on virtual units of measure for factors like the 10,000 requests per second throughput and storage. This makes measuring network costs for ingress and egress between Azure regions a bit more challenging.

I’ve estimated the network costs at $250, although this may or may not apply to your setup. Therefore, the total monthly cost for a similar setup in CosmosDB could range between $2,250 and $2,500.

FerretDB Cost Estimates

Finally, for FerretDB, the cost estimates are very similar to DocumentDB as they’re both utilizing the R5 large instance type. The storage cost is the same as well, using gp3.

For networking costs, I added $250 due to networking between services. The actual cost could be more if you are bridging between cloud providers. However, if you’re operating in the same availability zone, region, and account, network costs will be relatively low, albeit not free.

A significant difference in the FerretDB estimate is the inclusion of a DBA cost of $4,000 per month. This cost is optional and highly dependent on your existing infrastructure. Do you already have a DBA team? Have you already allocated personnel hours so that this $4,000 is something the team can absorb? Or is this an additional demand for your team, requiring extra development time?

If we remove the DBA cost for a moment, we’re looking at a monthly cost of $1,823 for both FerretDB and DocumentDB, making them comparable. The choice between DocumentDB and FerretDB then hinges on specific feature requirements or the desire to remain within the Postgres ecosystem.

In terms of performance, DocumentDB offers roughly 9.5K and 16K for read and write operations respectively, while FerretDB offers similar performance with around 9K and 15K. This highlights their similarities in performance as well.

I use FerretDB and DocumentDB as main examples not because other technologies are better or worse, but because they provide a more direct comparison. Both are using AWS instance types, there’s no additional cost layer for support, and they aren’t bridging between different clouds which could complicate an apples-to-apples comparison.

My personal view is that they’re very evenly matched, with FerretDB offering more features from a compatibility perspective. Based on cost and performance benchmarks, the choice could go either way depending on the specifics of your use case.

Conclusions and Recommendations

DocumentDB is a suitable solution for many users already on AWS who require a horizontally scalable setup. CosmosDB might be a better choice if you need global distribution, multiple APIs, or more flexible scaling operations that aren’t available in typical systems.

Atlas is very appropriate for those who need a fully managed, multi-cloud, multi-region MongoDB solution. However, keep in mind that this flexibility comes with additional cross-provider and cross-account charges that can significantly impact your budget. One of the largest sources of unintentional billing, apart from S3, comes from bandwidth costs, which are difficult to accurately predict and attribute.

FerretDB is likely the best solution if you require high performance, open-source compatibility, and prefer it to run on top of Postgres. This is especially true if you have an existing DBA and DB resources so that you don’t need the features of a managed service because you essentially have your own internal managed service. In the long term, FerretDB is probably the most cost-effective choice.

But if you lack these resources, “Database as a Service” offerings will help you reach the market faster and provide more self-service options for your development community.

Final thoughts: It’s crucial to evaluate each application’s requirements and use cases for these technologies. While this presentation provides a high-level comparison, certain details such as bandwidth or DBA costs can have a substantial impact. I encourage further testing and optimized testing for each technology to make more informed decisions.

Also, stay up to date with developments and updates across these technologies. For instance, FerretDB, being new to the market, is evolving rapidly. Similarly, DocumentDB has improved its IAM-based internal users’ features, which I found out recently. As an AWS proponent, I appreciate the ease that IAM authentication brings to credential rotation and short-term credential management.

Contact David if you would love deeper benchmarks on a specific comparison as a followup.

About the author

David Murphy

Database, Cybersecurity, and Operations specialist at Udemy

David Murphy is the Principal DBRE Engineer at Udemy. With over two decades of experience in the technology industry, he has worked with some of the biggest names in the business, including Object Rocket, Percona, and EA. David’s expertise lies in database architecture, performance optimization, data security, and cloud computing. He is passionate about helping organizations achieve their goals by providing scalable, reliable, and efficient solutions for their technology stack.

Related posts

Subscribe to Updates

Privacy Policy

The post Comparing CosmosDB, DocumentDB, MongoDB, and FerretDB as Document Database for Your Stack first appeared on Document Database Community.

The post Comparing CosmosDB, DocumentDB, MongoDB, and FerretDB as Document Database for Your Stack appeared first on Document Database Community.

]]>
Databases: Switching from relational to document models, Part 3 https://documentdatabase.org/blog/databases-switching-from-relational-to-document-models-part-3/?utm_source=rss&utm_medium=rss&utm_campaign=databases-switching-from-relational-to-document-models-part-3 Wed, 26 Apr 2023 19:26:57 +0000 https://documentdatabase.org/?p=700 Migration to document-oriented databases: best practices and common mistakes.

The post Databases: Switching from relational to document models, Part 3 first appeared on Document Database Community.

The post Databases: Switching from relational to document models, Part 3 appeared first on Document Database Community.

]]>
Databases: Switching from relational to document models, Part 3
Reading Time: 4 minutes

This is a three-part blog series. Part one is located here, and part two can be found here.

Migration to Document-Oriented Databases

There are various reasons why companies decide to migrate to document-oriented databases. One of the primary reasons is to achieve better performance and scalability to handle large database sizes. In new environments with several terabytes of data, it may be more efficient to move to a web-ready database. Additionally, the ability to provide high availability and minimize downtime is crucial, and document-oriented databases are designed to offer these features.

Using a flexible schema can also be advantageous for e-commerce, web applications, or any other application that requires scaling out. By adding more nodes, the performance can be improved, and replication and sharding features can be utilized.

However, before migrating, it is important  to ensure that the workload is compatible with the features that document-oriented databases offer. Not all applications will perform better on a document-oriented database, and it is essential to take advantage of the appropriate features for the migration to be successful. Migrating from a relational database can be complex and time-consuming, particularly if denormalization is involved. If the application is legacy or existing, the entire data layer may need to be rewritten since document-oriented databases do not use SQL.

It is also important to note that not all document-oriented databases offer features such as triggers and stored procedures. Therefore, check if the application relies heavily on these features before migrating. However, document-oriented databases share many similarities with relational databases, and primary keys and secondary indexes can still be used.

Most document-oriented databases are NoSQL and asset-compliant, and they offer various security features, such as roles and granular access to different collections, TLS/SSL, and more. It is important to monitor and maintain the database regularly, checking for poorly performing queries and creating alerts and backups.

In summary, migrating to a document-oriented database can be beneficial, but it requires careful consideration and planning to ensure a successful transition.

Best Practices and Common Mistakes of Migration

Here’s a summary of the best practices and common mistakes when migrating or developing applications using document-oriented databases.

Best Practices:

– Keep the document as simple and consistent as possible, and make it human-readable for developers and maintainers;

– Organize tables or collections per subject;

– Avoid joints as much as possible;

– Use embedded documents wisely and avoid duplicating entire data;

– Pay attention to indexes and query performance.

Common Mistakes:

– It’s okay to use both NoSQL and SQL databases in the same application;

– Do not try to simplify the import of your relational data into a document-oriented database;

– Not enforcing any pattern (schema) will make data retrieval complex: keep the name and properties of documents consistent to make querying easier;

– Be aware of the language and functions of your chosen NoSQL database. NoSQL languages are not standardized. 

By following these best practices and avoiding common mistakes, developers can ensure a successful migration or application development using document-oriented databases.

This is a three-part blog series. Part one is located here, and part two can be found here.

About the author

Adamo Tonete

MongoDB SME

With decade of experience in NoSQL Databases, Adamo have worked for different companies such as Percona, MongoDB and PingCAP. Currently Adamo works as a MongoDB SME for an Irish company, maintening a 24×7 system.

Related posts

Subscribe to Updates

Privacy Policy

The post Databases: Switching from relational to document models, Part 3 first appeared on Document Database Community.

The post Databases: Switching from relational to document models, Part 3 appeared first on Document Database Community.

]]>
Databases: Switching from relational to document models, Part 2 https://documentdatabase.org/blog/databases-switching-from-relational-to-document-models-part-2/?utm_source=rss&utm_medium=rss&utm_campaign=databases-switching-from-relational-to-document-models-part-2 Wed, 26 Apr 2023 18:59:05 +0000 https://documentdatabase.org/?p=692 Document Databases: Introduction, flexible schema, document model and JSON documents.

The post Databases: Switching from relational to document models, Part 2 first appeared on Document Database Community.

The post Databases: Switching from relational to document models, Part 2 appeared first on Document Database Community.

]]>
Databases: Switching from relational to document models, Part 2
Reading Time: 7 minutes

This is a three-part blog series. Part one is located here, and part three can be found here.

In this part, we will discuss document databases, also known as document-oriented databases, which are a type of NoSQL database. It’s important to note that not all NoSQL databases are document-oriented, as there are different types, such as key-value or column-store databases.

Document Databases – Intro and Flexible Schema

Document databases were developed in the late 2000s to accommodate the new reality of systems. As the cost of hardware decreased, applications now need to scale to handle thousands or millions of users. Therefore, scaling is the focus, rather than keeping the same idea of normalization used in traditional relational databases.

Both databases were designed for the new reality of systems where the cost of hardware has significantly decreased over the years. Applications now need to scale out since we have many web applications serving thousands, even millions of people, and a single instance may not be able to handle all the data. Scaling is one of the  priorities for most document-oriented databases. If you have a huge amount of data, running a join or an aggregation may take a while. Therefore, document-oriented databases offer a flexible schema. When I say flexible schema, it does not mean “no schema”. You can enforce some kind of schema, and if you don’t want to, you can split your data based on subjects. For example, it’s better to have a collection or table for a customer, and you need to have different collections for different purposes.

The advantage of a flexible schema is that it allows developers speed up the development process. You don’t need to create all your tables or specify the whole system with hundreds of tables. You can start creating your objects and optimize them as you go. With the era of web and mobile applications, sometimes you need to develop and deploy software within a couple of months. Document databases save their document or object as JSON. Although not necessarily saved as JSON on disk, some databases can change that for compression purposes. However, it is going back and forth over the wire, and that’s a good thing because most recent languages can parse JSON easily. Instead of getting rows and parsing all the columns in a row and then creating a JSON response, you can have it easily with a document-oriented database.

Suppose we are developing an e-commerce application. In that case, the flexible schema is helpful since the products have different properties, and creating a normal relational database may require hundreds of fields, most of which may be new since you don’t have the property. For instance, if I have two products, a laptop, which is electronic, has a brand, memory, and color, and a t-shirt, which is just a t-shirt, doesn’t have memory but has a size, both of these can live in the same table. In this example, these are totally valid documents in my database, and the first one doesn’t have the size field while the second one doesn’t have the memory field. The two versions of a flexible schema are that you can omit some fields, and you can have fields that are polymorphic. For example, you can have fields as an integer or a string depending on the product.

When receiving the result from your API, if you’re going to validate if the field size is there, or you’re going to map the keys and get the values of the keys, you don’t need to know the keys. You just need to parse them and get the key names and the key values, making it easier than going and checking if the field exists or not. Some databases offer a filter called schema validation where you can ensure that some fields must be present in your document.

Document Databases – Document Model and JSON Documents

Continuing to talk about the document model and the flexible schema, let’s explore a scenario where we want to create a customer entry in the document database. In this case, I am using the same field names as previously mentioned in the relational database – ID, full name, birthday, and created by. 

However, in this example, I am saving the entire value of my created_by user as an embedded document. This example demonstrates the document model one of several ways to design the schema for a document-oriented database. It does not mean that it is the only correct way; it all depends on your data access pattern.

Querying for Joe would result in retrieving all the information without running a join or merging two different tables. Although this could lead to some data duplication, it is ok for a document-oriented database as sometimes we may duplicate data to gain performance.

The document schema will depend highly on your data access pattern. For instance, if your application always requires knowing who created the user, it would be better to save the user along with the customer as an embedded document. On the other hand, sometimes opting for a reference rather than repeating the data is better. This decision depends on your query pattern for a specific table.

We can have strings, numbers, booleans, arrays, or other objects as a value for fields in a document-oriented database. These fields are polymorphic, meaning you do not need to use the same value type for each field in the same collection. You can have one specific entry or document where the value of a field is an array, and for another document, it can be something different. However, it can be hard to parse such entries.

Let’s consider the scenario of how orders will look like in a document-oriented database. Instead of having five tables, we can have one single document containing all the information required. This document can include the order ID, customer name, customer ID, order details, product ID, product name, price, and total.

There are two main ways to have data together in a document-oriented database. We can either embed the data or use references. We can embed data by adding it to our collection, as shown in the example of embedding customer data. Alternatively, we can use references, allowing us to link documents together. Still, we will only have access to the referenced data when we query it.

In conclusion, the schema for a document-oriented database depends heavily on your data access pattern. It offers flexibility when it comes to a variety of value types, and we can have data together using either embedding or references.

Document Databases – Summarising

Embedded documents can improve query performance by providing all the required information in a single request and response, eliminating the need for joins or lookups. This makes the document human-readable and simplifies the process, as you can easily understand the structure and content of your order without going back and forth between different tables. However, this approach can result in data duplication.

On the other hand, references can help avoid duplication, especially when dealing with large documents. Using references can save document space and prevent the need for data normalization, but it may also introduce complexity and reduce human-readability. Furthermore, query performance can be slower in document-oriented databases compared to relational databases if using lookups, which have more advanced join algorithms.

When working with embedded documents, it’s crucial to be cautious when the embedded document is an array  as  document can grow indefinitely. For instance, if you’re developing a car tracking application that updates the car’s position every second, embedding this data in the car document could lead to the document reaching its maximum size (e.g., 16MB for MongoDB), which would result in poor query performance or unexpected behaviors.

To optimize performance, avoid using multiple references in a document-oriented database, as this could lead to a relational database-like structure. Keep in mind that document-oriented databases do not perform joins in the same way as relational databases.

When it comes to querying document-oriented databases, each database uses its own query language. For example, MongoDB uses the MongoDB Query Language (MQL), which is JavaScript-based, while other databases like Couchbase Server and RavenDB use their own query languages. Similarly, Cassandra has its query language based on the SQL standard.

In conclusion, understanding the differences and best practices for working with document-oriented databases is essential when transitioning from a relational database or developing a new application.

This is a three-part blog series. Part one is located here, and part three can be found here.

About the author

Adamo Tonete

MongoDB SME

With decade of experience in NoSQL Databases, Adamo have worked for different companies such as Percona, MongoDB and PingCAP. Currently Adamo works as a MongoDB SME for an Irish company, maintening a 24×7 system.

Related posts

Subscribe to Updates

Privacy Policy

The post Databases: Switching from relational to document models, Part 2 first appeared on Document Database Community.

The post Databases: Switching from relational to document models, Part 2 appeared first on Document Database Community.

]]>
Databases: Switching from relational to document models, Part 1 https://documentdatabase.org/blog/databases-switching-from-relational-to-document-models/?utm_source=rss&utm_medium=rss&utm_campaign=databases-switching-from-relational-to-document-models Wed, 26 Apr 2023 14:47:00 +0000 https://documentdatabase.org/?p=665 Relational Databases: review, ormalization, SQL language and joins.

The post Databases: Switching from relational to document models, Part 1 first appeared on Document Database Community.

The post Databases: Switching from relational to document models, Part 1 appeared first on Document Database Community.

]]>
Databases: Switching from relational to document models, Part 1
Reading Time: 5 minutes

This is a three-part blog series. Part two is located here, and part three can be found here.

In this blog, we will review relational databases and compare the idea of normalization between relational and document-oriented databases, which is a common challenge for those migrating from a relational to a document model. We will also discuss the SQL language, storage engines, and joins. Later, we will compare the model for normalization and discuss different ways to organize your document-oriented database. We will also delve into the query language, although it’s important to note that depending on the specific database, it could vary.

Finally, we will discuss best practices and common mistakes when migrating from a relational database to a document-oriented database. This blog mainly focuses on MongoDB, FerretDB, CosmosDB, and other common document-oriented databases. If you have any remaining questions, feel free to ask them in the comments below.

Relational Databases – Review

Now I will discuss the relational database, which will serve as a review for those who have never worked with a relational database before. This will give you an idea of what a relational database is, as they are commonly used nowadays. I will also mention some of the most popular relational databases currently in use, including Oracle, PostgreSQL, MySQL, and Microsoft SQL Server.

In the 1970s, Edgar Frank Codd wrote the first paper on relational databases. This was an important milestone for the development of the initial database. Nine years later, Oracle released their first commercial relational database, which was followed by others such as DB2 and Informix.

When these databases were first designed, they were created to use as little resources as possible due to the high cost of storage and processing power at the time. They were designed to run on single hosts and provide good performance with these limited resources.

Despite being over 40 years old, relational databases are still widely used today and are essential to numerous systems. Many of these systems are able to run smoothly with the initial design of these databases.

Relational Databases – Normalization

When discussing document-oriented databases, it’s important to consider the types of workloads that are best suited for them. In contrast, when working with relational databases, normalization is a crucial step in the database design process. This involves splitting tables to avoid duplicate data, save space, and ensure consistency. Relational databases use SQL, and there are several pros and cons associated with data normalization. On the positive side, normalization helps to avoid data duplication, saves space, and can improve performance by cashing small tables. However, there are also some cons to consider, such as a high dependency among tables, making it difficult to retrieve information if a key table is lost. Additionally, there can be a strict schema, making it challenging to add fields or change the primary key.

As an example, consider a simple normalized system for orders that includes customer, user, orders, and order details tables. The customer table includes the customer’s full name, birthday, and creation date, while the user table tracks which application created the customer. The orders table contains minimum fields such as order ID, customer ID, and created date, while the order details table includes the product ID, name, amount, and final price. This is a typical layout for a standard relational database.

Relational Databases – SQL Language and Joins

Relational databases have different kinds of commands such as data manipulation commands (e.g., delete, update, select, join), and data definition commands (e.g., create table, create index). Some relational databases have their own SQL functions for certain operations, like creating backups or adding external data sources.

There is a standard SQL that most relational databases use, but some may have specific commands that are unique to them. It’s important to know these differences because migrating from one relational database to another may not work due to specific codes being used in data definition language (DDL). Some operations may block other operations, but modern storage engines and databases try to avoid blocking as much as possible.

To take advantage of normalization, you need to tell the database what you need. For example, to find all the orders made by a customer called “Joe” you would need to use an inner join to fetch data from child tables. The query optimizer will find the fastest path to retrieve information using indexes, statistics, or other information. Once the query is completed, the relational database returns the results to the client, which could be an application or a user.

This is a three-part blog series. Part two is located here, and part three can be found here.

About the author

Adamo Tonete

MongoDB SME

With decade of experience in NoSQL Databases, Adamo have worked for different companies such as Percona, MongoDB and PingCAP. Currently Adamo works as a MongoDB SME for an Irish company, maintening a 24×7 system.

Related posts

Subscribe to Updates

Privacy Policy

The post Databases: Switching from relational to document models, Part 1 first appeared on Document Database Community.

The post Databases: Switching from relational to document models, Part 1 appeared first on Document Database Community.

]]>
Why a Document Database Community is Essential for Modern Application Development https://documentdatabase.org/blog/why-a-document-database-community-is-essential-for-modern-application-development/?utm_source=rss&utm_medium=rss&utm_campaign=why-a-document-database-community-is-essential-for-modern-application-development Thu, 13 Apr 2023 14:06:42 +0000 https://documentdatabase.org/?p=706 Joining a document database community can help you stay up-to-date with the latest trends and developments in this field, and enable you to become a better developer.

The post Why a Document Database Community is Essential for Modern Application Development first appeared on Document Database Community.

The post Why a Document Database Community is Essential for Modern Application Development appeared first on Document Database Community.

]]>
Why a Document Database Community is Essential for Modern Application Development
Reading Time: 4 minutes

In the world of software development, document databases have emerged as a powerful tool for managing semi-structured and unstructured data. Unlike traditional relational databases, document databases are designed to store and manage data in the form of documents, which makes them more flexible, scalable, and performant.

What are document databases?

Simply put, a document database is a type of NoSQL database that stores data in the form of documents. These documents can contain a wide range of data types, including text, images, and videos. The documents are self-contained, which means that they contain all the information related to a particular entity, making them ideal for applications that deal with complex data structures.

Unlike relational databases, which use a fixed schema to define the structure of the data, document databases are schema-less. This means that developers can add or remove fields from documents as needed, without having to make changes to the database schema. This flexibility allows developers to adapt quickly to changing data requirements and to create applications that are more agile and responsive.

Document databases also use a different data model than relational databases. Instead of storing data in tables with predefined relationships, document databases store data in collections of documents. This makes it easy to represent complex hierarchical relationships between data objects.

Why is a document database community needed?

Now, let’s talk about why a document database community is needed.

A document database community is a group of developers, users, and enthusiasts who share a common interest in document-oriented databases. This community is essential to promote the adoption of document databases and provide support to developers who are using or planning to use these databases in their applications.

Here are some reasons why a document database community is necessary:

Sharing Knowledge and Best Practices: A community is a place where developers can share their knowledge and best practices for using document databases effectively. This exchange of ideas and experiences can help developers overcome common challenges and avoid common pitfalls.

Promoting Adoption and Education: A strong community can help promote the adoption of document databases by educating developers about the benefits of using these databases and providing resources to help them get started. This can include documentation, tutorials, and sample code.

Providing Support: Developers using document databases may encounter issues or have questions about how to use specific features. A community can provide support by answering questions and providing guidance on how to solve problems.

Contributing to Open Source Projects: Many document databases are open source, which means that anyone can contribute to their development. A community can provide a platform for developers to collaborate on open source projects, making them more robust and feature-rich.

In conclusion, document databases have become an essential tool for modern application development due to their flexibility, scalability, and performance. A strong document database community is needed to promote their adoption, share knowledge, provide support, and contribute to their development.

If you’re interested in document databases, consider joining a community to learn more and get involved!

Follow us on social media

Learn more about our monthly meetups.

Join the community on Slack.

Share this post

Related posts

Subscribe to Updates

Privacy Policy

The post Why a Document Database Community is Essential for Modern Application Development first appeared on Document Database Community.

The post Why a Document Database Community is Essential for Modern Application Development appeared first on Document Database Community.

]]>