How about more insights? Check out the video on this topic.
The topic I’m discussing today is the document databases benefits and use cases for various technologies. It’s important to understand that no technology is inherently bad, but rather each one has its place depending on your specific needs. After exploring these potential use cases, I’ll compare each technology against each other to provide a more nuanced understanding of their strengths and weaknesses. Additionally, I’ll include benchmarks and estimated costs for different workloads, both in terms of hardware and operational expenses. Finally, I’ll offer some conclusions and recommendations before opening up the floor to questions and answers.
AWS DocumentDB – Benefits and Use Cases
DocumentDB is a fully managed NoSQL database solution by AWS. Over time, it has seen several versions, with recent updates improving user authentication, something I found particularly exciting.
DocumentDB is a highly available system with automatic failover and backup capabilities. It offers compatibility with MongoDB APIs and drivers, albeit with a slight lag compared to the latest versions.
This solution is optimal for generic use cases that require strong query abilities, indexing capabilities on JSON data, ACID transactions at the document level, and a flexible schema design. It offers all the core features you look for in a document database.
One of the key advantages of DocumentDB is its integration with AWS. If you’re already operating within the AWS ecosystem, it provides a simple, streamlined experience, easily integrating with IAM and fitting well into your existing control and tagging infrastructure for cost management.
Overall, while there are no standout features that make it extraordinarily compelling, nor any significant downsides, DocumentDB shines for its simplicity and integration within the AWS framework.
MongoDB Atlas – Benefits and Use Case
MongoDB Atlas offers scalability and flexibility, with advantages in charting and high availability. It stays current with a wide variety of programming languages and frameworks. It also provides additional security features that might not be available elsewhere. However, whether you need these features depends on your individual security tolerances.
Atlas can be effectively used in various applications, thanks to its flexibility and low latency. However, it also comes with certain challenges, including potential cross-account bandwidth costs and data sovereignty concerns. Your security department will need to evaluate these considerations.
For instance, comparing AWS DocumentDB with Atlas briefly, in Atlas, MongoDB Inc. owns the storage containing your data. In contrast, with DocumentDB, while AWS manages the infrastructure, the data resides in your account and uses your KMS key. This difference means that from a data sovereignty perspective, you can assure your security department that with DocumentDB, the data remains within your sphere of control in the shared ownership model. This difference might be a critical factor in some cases. More on this later.
Azure CosmosDB – Benefits and Use Case
Moving on to Azure CosmosDB, for those unfamiliar with it, CosmosDB is more than just a typical setup. It’s a multi-engine database technology capable of flipping between a document model and several other distinct models. It supports global distribution and offers a fresh perspective on database architecture. It’s similar to technologies such as TiDB, which lean more towards distributed setups than even standard MongoDB does. In essence, it’s heavily clustered and designed for distributed use.
CosmosDB provides high availability, automatic failover, and automated backups, which align with our requirements. It also supports multiple APIs and programming languages. However, like DocumentDB, its APIs may not be as current as those provided by MongoDB Inc’s Atlas. If you aim to stay on the bleeding edge with the latest versions of MongoDB, this is a factor you’ll need to consider.
Despite this, CosmosDB has numerous advantages. It delivers low-latency performance, global scalability, and operates as a multi-model database. Although the multi-model aspect is not directly relevant to our current discussion, it’s an important feature to consider if you’re in the market for such a database. Technologies like this, or Tigris, are certainly worth exploring in those scenarios.
Ferret DB – Benefits and Use Case
Then finally I want to cover FerretDB. So FerretDB has a lot of benefits and use cases but the big differentiator here is while it has high availability and scalable it’s compatible with the drivers and tools, more compatible with drivers and tools compared to DocumentDB and CosmosDB. However, the release of FerretDB 1.0 could have possibly changed this and I would need to look into it further.
One important thing to note is that FerretDB is an open source database solution which means you can run it yourself. There are also managed Postgres systems available to assist in running FerretDB, but this is a topic for a separate discussion.
FerretDB’s use cases range from needing a flexible yet advanced database system to simplifying the stacks for a DBA team. It’s important to note that ferret is not a managed service like some of the other options we discussed earlier.
Overall, FerretDB is a highly available and scalable solution with numerous benefits and use cases that make it a strong contender for database management needs.
Pros &Cons Between Each Technology
AWS DocumentDB vs Azure CosmosDB – Comparison
In this comparison of AWS DocumentDB and Azure CosmosDB, we examine the pros and cons of each. Firstly, CosmosDB is more expensive than most other solutions, including DocumentDB, which is a highly cost-effective option. Both offer strong indexing capabilities and high availability, but CosmosDB can be limiting in terms of querying and indexing when compared to its competitors. This is particularly apparent when performing more complex operations such as aggregations.
Both platforms have flexible schema designs and are compatible with multiple APIs and drivers, making migration easy, however, CosmosDB is a thin shim layer on top of its core, which can lead to more frequent issues. While Azure is not necessarily a bad choice, the speaker has found AWS and Google Cloud to be more effective in their experience.
DocumentDB has slightly higher latency compared to some other tech, which can be managed by right-sizing the system, but there may be some underlying network differences between Azure and AWS that impact throughput. Overall, it’s important to analyze your specific use case to determine which platform is best suited to your needs.
AWS DocumentDB vs Ferret DB – Comparison
In this comparison, we will be discussing the differences between AWS DocumentDB and FerretDB.
Firstly, FerretDB is an open-source solution, so there are no licensing or management overhead fees involved, making it a cost-effective option. However, when it comes to long-term cost-effectiveness, AWS DocumentDB tends to be more manageable, especially if no automations are in place for deploying or managing FerretDB. With DocumentDB, the process is straightforward and manageable. Additionally, using Terraform providers and modules for DocumentDB made the process easy for us; authentication and encryption were completed in a matter of minutes.
Regarding schema design and feature sets, FerretDB has limitations, specifically when we talk about querying. With DocumentDB, some things are unavailable because older flavors do not run on 5.0 or 6.0. However, this shouldn’t be a major concern if we are looking at this from a credit-type perspective.
Regarding scalability, both are evenly matched in terms of what they can do, and it comes down to what infrastructure and tools are used to build out that Postgres server to get high availability. As of yet, there aren’t many tools available in the market, but we’re keeping our eyes open.
Regarding compatibility, both are evenly matched regarding 3.4 driver semantics. However, FerretDB is more compatible with more modern drivers than DocumentDB.
Lastly, DocumentDB is a self-managed solution, which makes it easy to build and tear it down using terraform for the developer community. However, managing FerretDB would require more effort from a DBA or dbre team, but it is manageable. In conclusion, while there are pros and cons for both solutions, it ultimately comes down to what is best suited for the organization’s needs and requirements.
Azure CosmosDB vs MongoDB Atlas – Comparison
When comparing Azure CosmosDB and MongoDB Atlas, there are a few differences to consider. For one, MongoDB Atlas is the newest offering and is quite expensive, but it comes with an Enterprise contract that could be beneficial for those with specific security needs. MongoDB Atlas is also likely to be the first to receive new features if they are for one of its core components. However, if we’re discussing something like FerretDB, it’s uncertain if they will always lag in security updates or new features.
Another difference to consider is the driver usage for MongoDB Atlas and FerretDB. If MongoDB Atlas drops a new feature without notice, it may take days, weeks, or even months for FerretDB to have the same feature available. The same can be said for DocumentDB. Azure CosmosDB and MongoDB Atlas, along with FerretDB, can all be globally distributed.
The sizing of MongoDB Atlas plans may not align with everyone’s needs. Additionally, while MongoDB Atlas is fully-managed and provides direct access to engineers, some have found the support team to be less than stellar in their responses to specific questions. It’s recommended that companies considering MongoDB Atlas look deeply at their support needs.
If comparing DocumentDB to Atlas, some have found AWS support to be more proactive in attempting to solve problems. MongoDB Atlas may put requests for feature improvements in a queue without any visible progress.
Overall, it’s important to consider specific needs and requirements when choosing between Azure CosmosDB and MongoDB Atlas.
I’m not certain why the operation wrapped around, but I used NYCBS data for my test, executing 10,000 requests per second. I averaged the results from multiple runs to achieve the numbers for these averages. Later, I’ll discuss cost elements and how I extrapolated them.
When it comes to CosmosDB benchmarks, I observed an average create time of about 2.2 milliseconds and a read time of 0.6 milliseconds. As shown on the screen, reads are roughly twice as fast as any write. Meanwhile, updates require slightly more overhead than a standard write, but deletes were slower. I didn’t quite understand why deletes were significantly slower than updates. I would have expected delete and create operations to be similar from an operations-per-second perspective.
These results are based on a burn-in run followed by three iterations lasting four hours each, which ensured that I accounted for any bursting or similar phenomena. I didn’t perform any specific tuning of the payload with YCSB, so I want to make it clear that these results shouldn’t be considered a finely-tuned CosmosDB, DocumentDB, or FerretDB benchmark. They reflect generic benchmarks with default settings.
As for Atlas benchmarks, CosmosDB is slightly faster in creating data, with an average of 1.8 milliseconds versus 2.2. Consequently, I experienced a bit more throughput with Atlas. The read speeds were comparable, but Atlas still managed to provide higher throughput. Interestingly, Atlas had consistent speeds of 1.8 milliseconds for all write-oriented requests, unlike CosmosDB which exhibited variable response times. This discrepancy could be attributed to some architectural differences, perhaps some kind of verification step in CosmosDB that slows down updates and deletes.
Regarding FerretDB, the create time during my tests averaged around three milliseconds. Read speeds were comparable to the other systems, ranging from 0.5 to 0.8 milliseconds. I observed that updates were rather costly, as were deletes. Perhaps in a subsequent conversation, we could discuss optimizing PostgreSQL and FerretDB to improve these speeds. Once again, these figures were derived from a default setup.
Cost Estimate Summary
This is a high-level summary, and you might notice that FerretDB seems exceptionally costly. However, it’s the only one where I included the cost of a part-time DBA working 20 hours per week. When I discuss FerretDB, I’ll specify exactly where this cost comes from.
In many cost estimates, they’ll state the cost of compute and storage, but seldom address the expenses related to managed services, especially when these are offered by third parties as opposed to being in-house. These estimates often overlook cross-company, cross-account, and cross-region costs from a bandwidth perspective. Hence, I’ve incorporated some estimates for these factors.
The calculations are based on an R5 large instance type with a gp3 type and 3000 IOPS. The payload size was set at one kilobyte for network considerations. This is the framework I used to model these costs.
DocumentDB Cost Estimates
In terms of DocumentDB, I ran these numbers on the US East-1 region, if I recall correctly, and used the DB R5 large instance type. The compute cost came out to be $373. To accommodate one terabyte of data, I added storage at an additional cost of $1,200, which considers the higher IOPS than standard IOPS and throughput. This cost could potentially be reduced by tuning, but for these estimates, I maintained this figure.
The networking cost, which includes cross-service traffic costs, is estimated at $250 per month. As for the DBA cost, I’ve listed it as zero. This doesn’t imply that there is no DBA cost, but since DocumentDB is a managed service, the initial investment in DBA time is minimized. Therefore, the total monthly cost for this DocumentDB example comes out to be $1,823.
MongoDB Atlas Cost Estimates
For Atlas, I used a similar setup with the M10 type of cluster and the same type of storage. Like in the previous example, the storage cost is fixed at $1,200. However, the compute cost for Atlas is slightly higher at $438 compared to $373 for DocumentDB. This increase could be attributed to the additional fee MongoDB charges for access to the Enterprise license and support.
What particularly concerned me with Atlas were the networking costs, primarily due to cross-account bandwidth charges. Since Atlas is not located in your account or your VPC, but in MongoDB’s account and their VPC, these cross-account charges for bandwidth apply. After factoring these in, I estimated the networking cost to be about $1,600 a month for one terabyte of storage and 10,000 requests per second.
Therefore, the total cost for this Atlas setup comes out to $3,238 per month, which is notably higher than that of DocumentDB.
CosmosDB Cost Estimates
With CosmosDB, the pattern is similar, but as you can see, the compute cost is significantly higher. As mentioned earlier, CosmosDB has a more complex cost model because it’s based on virtual units of measure for factors like the 10,000 requests per second throughput and storage. This makes measuring network costs for ingress and egress between Azure regions a bit more challenging.
I’ve estimated the network costs at $250, although this may or may not apply to your setup. Therefore, the total monthly cost for a similar setup in CosmosDB could range between $2,250 and $2,500.
FerretDB Cost Estimates
Finally, for FerretDB, the cost estimates are very similar to DocumentDB as they’re both utilizing the R5 large instance type. The storage cost is the same as well, using gp3.
For networking costs, I added $250 due to networking between services. The actual cost could be more if you are bridging between cloud providers. However, if you’re operating in the same availability zone, region, and account, network costs will be relatively low, albeit not free.
A significant difference in the FerretDB estimate is the inclusion of a DBA cost of $4,000 per month. This cost is optional and highly dependent on your existing infrastructure. Do you already have a DBA team? Have you already allocated personnel hours so that this $4,000 is something the team can absorb? Or is this an additional demand for your team, requiring extra development time?
If we remove the DBA cost for a moment, we’re looking at a monthly cost of $1,823 for both FerretDB and DocumentDB, making them comparable. The choice between DocumentDB and FerretDB then hinges on specific feature requirements or the desire to remain within the Postgres ecosystem.
In terms of performance, DocumentDB offers roughly 9.5K and 16K for read and write operations respectively, while FerretDB offers similar performance with around 9K and 15K. This highlights their similarities in performance as well.
I use FerretDB and DocumentDB as main examples not because other technologies are better or worse, but because they provide a more direct comparison. Both are using AWS instance types, there’s no additional cost layer for support, and they aren’t bridging between different clouds which could complicate an apples-to-apples comparison.
My personal view is that they’re very evenly matched, with FerretDB offering more features from a compatibility perspective. Based on cost and performance benchmarks, the choice could go either way depending on the specifics of your use case.
Conclusions and Recommendations
DocumentDB is a suitable solution for many users already on AWS who require a horizontally scalable setup. CosmosDB might be a better choice if you need global distribution, multiple APIs, or more flexible scaling operations that aren’t available in typical systems.
Atlas is very appropriate for those who need a fully managed, multi-cloud, multi-region MongoDB solution. However, keep in mind that this flexibility comes with additional cross-provider and cross-account charges that can significantly impact your budget. One of the largest sources of unintentional billing, apart from S3, comes from bandwidth costs, which are difficult to accurately predict and attribute.
FerretDB is likely the best solution if you require high performance, open-source compatibility, and prefer it to run on top of Postgres. This is especially true if you have an existing DBA and DB resources so that you don’t need the features of a managed service because you essentially have your own internal managed service. In the long term, FerretDB is probably the most cost-effective choice.
But if you lack these resources, “Database as a Service” offerings will help you reach the market faster and provide more self-service options for your development community.
Final thoughts: It’s crucial to evaluate each application’s requirements and use cases for these technologies. While this presentation provides a high-level comparison, certain details such as bandwidth or DBA costs can have a substantial impact. I encourage further testing and optimized testing for each technology to make more informed decisions.
Also, stay up to date with developments and updates across these technologies. For instance, FerretDB, being new to the market, is evolving rapidly. Similarly, DocumentDB has improved its IAM-based internal users’ features, which I found out recently. As an AWS proponent, I appreciate the ease that IAM authentication brings to credential rotation and short-term credential management.
Contact David if you would love deeper benchmarks on a specific comparison as a followup.
Database, Cybersecurity, and Operations specialist at Udemy
David Murphy is the Principal DBRE Engineer at Udemy. With over two decades of experience in the technology industry, he has worked with some of the biggest names in the business, including Object Rocket, Percona, and EA. David’s expertise lies in database architecture, performance optimization, data security, and cloud computing. He is passionate about helping organizations achieve their goals by providing scalable, reliable, and efficient solutions for their technology stack.
Explore the fusion of document and relational databases. Bridging Two Worlds: An Introduction to Document-Relational Databases.
Delve deep into the architecture of Apache Cassandra, a distributed key-value store that serves as a critical component in managing and analyzing large volumes of data in high-scale applications, using Nvidia as a case study.
Explore the world of open standards and licensing in the dynamic realm of database technology. Join Mark Stone in discussing the future of document databases.
Discover FerretDB, an open-source MongoDB alternative. Explore its vision, compatibility, and roadmap for agile databases.
Discover the incredible capabilities of Postgres for document storage. Explore Postgres Document Storage for efficient data management.
Document databases revolutionized data in the last decade. Sachin Sinha (BangDB) delves into their convergence with Graph, Stream, & AI, highlighting benefits and emerging challenges.
Presenting to you an enlightening conversation with Alexey Palashchenko, Co-founder & CTO at FerretDB and Bruce Momjian, Vice-President at EDB
Migration to document-oriented databases: best practices and common mistakes.
Document Databases: Introduction, flexible schema, document model and JSON documents.
Relational Databases: review, ormalization, SQL language and joins.
Subscribe to Updates