Bridging Two Worlds: An Introduction to Document-Relational Databases

March 2, 2024

Reading Time: 20 minutes

How about more insights? Check out the video on this topic.

Author: Kirk Kirkconnell, a Principal Database Advocate at Fauna

Today we’re diving into the world of document-relational databases. My journey in NoSQL began around 2011, traversing through MongoDB, Couchbase, and more. Let’s unravel the evolution from traditional relational databases to the realm of document-relational databases.

Transitioning from Relational Databases

Modern applications demand scalability beyond what most traditional relational databases (RDBMS) can offer. Think about the world of mobile apps, large web applications, IoT, edge computing, and e-commerce platforms. The need for scale drove us towards alternatives. But why exactly did relational databases start to fall short?

The Challenge of Scale and Complexity

Relational databases, while reliable, often struggle with the scale required by modern applications. The complexity of handling JSON within these databases added another layer of difficulty. Object-Relational Mapping (ORM) tools tried to bridge this gap, but often at the cost of obfuscating performance and cost issues, a particularly noticeable drawback in serverless architectures.

Agility and Flexibility Needs

The fast-paced nature of today’s technology landscape requires agility in development and data management. Relational databases, known for their rigidity, are not always up to the task, especially when rapid iteration and flexibility became the norm.

Reliability and Resilience in Distributed Systems

High reliability and resilience are crucial, especially in distributed systems spanning multiple availability zones and regions. Despite advancements, achieving strong consistency and operational efficiency across multiple regions remains a challenge with traditional RDBMS and even most NoSQL databases.

The Emergence of Document-Relational Databases

Enter document-relational databases, a blend of the NoSQL document model’s flexibility with the structured approach of relational databases. They address the challenges of scalability, complexity, agility, and distributed system management.

Why Document-Relational Databases?

Scalability: These databases are engineered for scalability, handling the demands of modern, large-scale applications without compromising on performance.
Handling Complex Data: They offer an efficient way to manage complex JSON data structures, simplifying the development process compared to traditional relational databases.
Agility and Rapid Iteration: The flexibility of document-relational databases supports rapid changes and development cycles, a critical aspect of modern software engineering.
Reliability and Resilience: Designed for distributed environments, they provide much-needed reliability and resilience, overcoming the limitations of traditional relational databases in this

Navigating the Cost and Complexity of NoSQL Databases

As we delve deeper into the world of document-relational databases, let’s address a significant factor that often catalyzes the shift from traditional RDBMS to NoSQL: cost efficiency. During my tenure at Nokia, we faced a similar situation. We were utilizing only a fraction of the capabilities of our relational database but were incurring costs for every CPU core. This prompted a transition to a NoSQL database, drastically reducing our annual expenditure from around a million dollars to approximately $80,000. This substantial cost reduction illustrates the economic advantage of adopting NoSQL databases in certain scenarios. Yes, prices have changed since back then, but the economics still hold up as now we have even more options with DBaaS and Serverless databases.

The Challenges in NoSQL Adoption

Despite the apparent benefits, adopting NoSQL databases comes with its set of challenges. One of the most daunting tasks is mastering data modeling and schema design. While NoSQL provides a lot of flexibility, it also demands a deeper understanding of why and how certain data modeling decisions are made and then when to make them.

Data Modeling: A Crucial Aspect

Data modeling in NoSQL is crucial. It’s where the most significant impact on performance, scalability, and cost is observed. The challenge lies in choosing the right schema. Often, there’s confusion between schema design and data modeling, with terms used interchangeably, though they are distinct concepts.

The Anxiety of Getting It Right

Post-deployment, many developers and database administrators experience anxiety about whether they’ve made the right choices in their data model. Questions like “Did I get it right?” or “Do I need to make changes?” are common. This uncertainty stems from the need to constantly adapt and optimize the data model based on application performance and evolving requirements. In most NoSQL databases, you are the optimizer. You make the choice of which index to use and when.

Denormalization vs. Normalization

A key decision in NoSQL data modeling is choosing between denormalization and normalization. Denormalization involves embedding data within a single document, often for the sake of query efficiency. However, this approach can complicate data updates and increase redundancy.

On the other hand, normalization – splitting data into separate documents or collections – can make data management more systematic but may lead to more complex queries and multiple database calls.

Managing Relationships in NoSQL

Handling relationships between documents presents another challenge. In a NoSQL environment, unlike relational databases, there are no built-in mechanisms like foreign keys. The decision to embed related data or reference it in separate documents depends on the specific use case and access patterns. This is often a delicate balance, especially for those new to NoSQL, as it requires a nuanced understanding of the trade-offs involved.

So, while the shift to NoSQL and document-relational databases can offer significant benefits in terms of scalability and cost, it also brings to the forefront the complexities of data modeling. Understanding these nuances is crucial for developers and database administrators to make informed decisions that align with their application’s needs and performance goals.

Addressing Query Complexity and Balancing Trade-offs in NoSQL

As we continue to explore the landscape of document-relational databases, let’s delve into the complexities and challenges that arise, particularly in query management and the inevitable trade-offs between performance, consistency, and scalability.

The Conundrum of Query Complexity

In NoSQL systems, as you structure your data into documents and define relationships, you may encounter a level of query complexity that wasn’t apparent at first. The decision to embed data or reference it across documents can lead to intricate querying scenarios. These scenarios might require multiple round trips to the database, which could be less than ideal for your application’s performance goals.

Balancing Act: Performance, Consistency, Scalability

In NoSQL databases, every decision is a balance between these three crucial aspects:

Performance: Sometimes, the need for speed is paramount. You might prioritize performance for certain operations where response time is critical.
Consistency: In other instances, consistency takes precedence. This is especially true for applications that require synchronized data across various regions or components.
Scalability: Ensuring that your database can grow with your application’s needs is essential. However, scalability often comes with trade-offs in performance or consistency.

It’s about understanding and deciding what you need, when you need it, and how much of it is necessary for each specific call or operation.

Multi-Region Deployment Challenges

Another significant challenge in NoSQL and document-relational databases is managing multi-region deployments. While data replication is possible in databases like MongoDB and DynamoDB, they usually offer eventual consistency. This might not suffice for scenarios requiring strong consistency across different regions.

Navigating Application Complexity

As we shift more functionalities from traditional databases to the application layer in NoSQL environments, we encounter a new set of complexities. Managing application logic within the CI/CD pipeline or in serverless architectures like AWS Lambda or Google Cloud Functions introduces its own challenges. While this approach enhances agility, it may not always be the most effective for certain types of database logic.

Adapting to Change in Data Models

One of the most dynamic aspects of working with NoSQL is adapting to change. When new access patterns emerge, or features are added, how do we adjust our data model? Tailoring data models to specific access patterns is crucial, but it also requires flexibility to adapt as those patterns evolve. Understanding the ramifications of these changes is key to maintaining an efficient and effective database structure.

Navigating Data Modeling and Scalability in Document-Relational Databases

As we continue exploring the landscape of document-relational databases, let’s address some common hurdles faced in NoSQL environments, particularly around data modeling and scalability, and how document-relational databases are innovatively tackling these challenges.

The Paralysis in NoSQL Data Modeling

A frequent issue in NoSQL is the increase in costs and decrease in performance, often stemming from incorrect data modeling. Unlike relational databases, which have clear guidelines like the third normal form, NoSQL lacks such standardized rules. This freedom in NoSQL, while empowering, also demands a higher level of responsibility and understanding. As Uncle Ben in Spider-Man wisely said, “With great power comes great responsibility.” This sentiment rings true in the world of NoSQL, where the liberty to model data as desired can lead to complex challenges.

Document-Relational: A Hybrid Approach

Document-relational databases emerge as a solution to these challenges. At their core, they combine the flexibility of a JSON document model with the robustness of relational capabilities. This hybrid approach aims to address key issues such as when to denormalize data for efficiency or when to maintain normalization for integrity and consistency.

The Role of Distributed Transaction Engines (DTE)

Central to the functionality of document-relational databases is the distributed transaction engine. This engine is pivotal in managing data across various locations while maintaining strong consistency. Especially in scenarios where traditional NoSQL databases might offer eventual consistency, document-relational databases strive to provide strong consistency, crucial for applications migrating from a relational model.

Serverless APIs and Scalability

A notable feature of document-relational databases is their serverless API, which aligns perfectly with the needs of modern applications. The lack of concern about connection pooling and other traditional database management aspects is a significant relief. This serverless nature ensures scalability and flexibility, allowing developers to focus on application development rather than database management intricacies.

Diverse API Connectivity

These databases offer various types of API connectivity, enhancing their versatility. Whether it’s connecting through an HTTP API from services like Cloudflare or using a standard application driver within the app, document-relational databases provide multiple integration options. This flexibility is crucial in today’s diverse and ever-evolving tech environment.

Server-Side Functions: Bridging Functionality and Efficiency

Server-side functions, akin to stored procedures in relational databases, offer a streamlined way to implement complex logic directly within the database. This capability enables more efficient data processing by reducing the need to shuttle data back and forth between the database and application servers. It simplifies the architecture for certain operations, ensuring that business logic is consistently applied across all instances of the application. This approach not only enhances performance but also centralizes logic implementation, making maintenance and updates more manageable.

Data Models & Schema

One of the standout qualities of document-relational databases is their inherent design to support evolving data models and schemas. Traditional NoSQL solutions, while flexible, often introduce rigidity when adapting to new access patterns or integrating new functionalities. Document-relational databases like Fauna are engineered to allow changes to the data model with minimal disruption. Whether it’s adding new JSON documents, adjusting existing ones, or enforcing schema requirements, these databases accommodate growth and change, ensuring that developers can adapt to new requirements without extensive database restructuring.

Distributed Transaction Engine: A Core of Consistency

At the core of document-relational databases lies the distributed transaction engine, a technology that resolves the dilemma of achieving data consistency across multiple regions. Unlike the eventual consistency model prevalent in many NoSQL databases, the distributed transaction engine ensures real-time consistency. This is vital for applications that operate on a global scale, requiring immediate data synchronization across different geographical locations. By providing a mechanism for strong consistency, document-relational databases eliminate the need for developers to implement complex workarounds in the application layer, simplifying global deployment strategies.

Together, server-side functions and the distributed transaction engine represent a leap forward in database technology, combining the flexibility and scalability of NoSQL with the structured approach and reliability of relational databases. These features address the critical needs of modern applications, offering developers a powerful toolkit for building scalable, efficient, and consistent systems across diverse environments.

Multi-Region Transactions

The distributed transaction engine (DTE) integral to Fauna represents a monumental shift in how document-relational databases manage data consistency, particularly in multi-region deployments. This engine ensures that operations and transactions are uniformly executed and committed across all geographical locations, a feature pivotal for applications requiring global data availability and integrity.

The Calvin Protocol Inspiration

DTE’s design, inspired by the Calvin protocol, offers a glimpse into the future of database architecture, emphasizing efficiency, consistency, and scalability. While not an exact replica of Calvin, the inspiration drawn from this protocol underscores Fauna’s commitment to advancing database technology. The Calvin protocol itself is renowned for its approach to simplifying distributed transactions, ensuring that data remains consistent across multiple nodes without the complexities traditionally associated with such operations.

Bridging NoSQL and Relational Capabilities

Document-relational databases, especially with the advent of Fauna’s DTE, bring back the relational capabilities that many felt were lost with the shift to NoSQL. This includes the ability to perform joins and establish relationships beyond mere primary keys, moving towards a model where references and foreign keys can play a role in data organization and retrieval.

Addressing Data Repetition and Scalability

A significant advantage of employing a document-relational approach is the reduction in data repetition. In traditional NoSQL systems, scalability often comes at the cost of duplicating data across documents to ensure availability and performance. However, with document-relational databases, the ability to reference data across documents without unnecessary repetition allows for more efficient data storage and access patterns. This efficiency is particularly beneficial for managing dynamic data, such as shopping cart contents, which can fluctuate frequently.

Tailoring Data Modeling for Dynamic and Static Needs

The flexibility to reference other documents and traverse these references enables a more nuanced approach to data modeling. It allows developers to segment data into dynamic and static components, optimizing how each piece of information is stored and accessed. For instance, while a user’s physical address (static data) might seldom change, the items in a shopping cart (dynamic data) can change with every transaction. This distinction guides the data modeling process, ensuring that the database structure aligns with the application’s real-world needs and access patterns.

In essence, the distributed transaction engine and the relational capabilities reintroduced in document-relational databases like Fauna mark a significant evolution in database technology. They combine the scalability and flexibility of NoSQL with the structured, relationship-oriented approach of relational databases. This hybrid model addresses key challenges in data management, offering developers a powerful tool for building complex, scalable, and consistent applications across the global digital landscape.

When approaching data modeling in NoSQL databases, a critical aspect to consider is the application’s access patterns. These patterns dictate not just the structure of the data but also how it’s queried, updated, and managed within the database. To navigate this landscape effectively, there are three pivotal questions every developer should ponder:

What Data does the Application Need?

Identifying the specific pieces of data required for each operation is crucial. For instance, authenticating a user demands their hashed password and username but not necessarily their contact details or shipping address. This selective data retrieval ensures efficiency and speed in database operations.

When does the Application Need That Data?

The context in which data is needed plays a significant role in how you design your data model. Data required for immediate display or processing needs to be readily accessible, possibly influencing how it’s indexed or cached. Understanding the application’s workflow helps in structuring the data model to align with these needs.

At What Frequency does the App Need That Data?

The frequency of access impacts decisions around scalability and performance optimization. High-frequency operations, like user authentication, may require a different architectural approach compared to less frequently accessed data, such as updating a user’s physical address. This consideration is key in planning for scale, ensuring the database can handle the load efficiently without compromising on performance.

By answering these questions, developers can tailor their data models to meet the specific needs of their applications. For example, user authentication data might be stored and indexed in a manner that allows for quick retrieval, given its high frequency and critical nature. Conversely, user profile information, like shipping addresses, which changes less frequently and is not always needed, can be structured differently.

This approach not only enhances the performance and scalability of the database but also ensures that the application remains responsive and efficient, providing a seamless user experience. It underscores the importance of a thoughtful, pattern-driven approach to data modeling in NoSQL databases, highlighting the need for a flexible, adaptable strategy that can evolve with the application’s requirements.

High-Frequency Operations: Authentication as a Case Study

Consider the process of user authentication, a high-frequency operation that necessitates rapid access to usernames and passwords, along with the capability to update the last login timestamp in real-time. Achieving this at a scale of 50,000 operations per second underscores the need for a data model that supports swift read and write operations without compromising on performance. This scenario exemplifies why certain data, due to its dynamic nature and frequent access requirement, is best kept readily accessible, potentially within the same document or a closely linked structure to minimize retrieval times.

Handling Static vs. Dynamic Data

The distinction between static and dynamic data plays a pivotal role in these considerations. Static data, such as a user’s physical address or credit card information, changes infrequently and can be embedded directly within a primary document. This approach reduces the need for additional queries or traversals to fetch related data, thereby enhancing performance for operations that access these pieces of information.

Conversely, dynamic data, which might include items in a shopping cart or available inventory quantities, is subject to frequent changes and might therefore be normalized across multiple documents or collections. This normalization facilitates updates and modifications without impacting the integrity or performance of accessing static data.

The Role of Foreign Key Relationships and Distributed Transaction Engines

Incorporating foreign key-like relationships in NoSQL databases introduces a layer of flexibility not traditionally associated with NoSQL models. These relationships enable developers to design data models that can reference related pieces of information across documents, combining the benefits of relational database structures with the scalability and flexibility of NoSQL.

Central to enabling this relational-like functionality within a NoSQL environment is the distributed transaction engine. This engine ensures that despite the distribution of nodes across the globe, data remains consistent across all locations. It allows for the seamless integration of dynamic and static data models, ensuring that applications can rely on up-to-date information regardless of the complexity of their operations or the geographical distribution of their data.

Purposeful Data Modeling: Embedding vs. Relating

When deciding between embedding data directly within a document or linking it through relationships, the guiding principle should be the specific requirements of your application’s functionality. For example, considering the dynamic nature of a shopping cart, one must evaluate whether product names and prices should be embedded within an order document or referenced via foreign keys. The decision hinges on factors like how often this data changes and the impact on query performance and data integrity.

The rationale behind each choice should be clear and justified. Embedding might be favored for efficiency in data retrieval and minimizing database calls, particularly for closely related data that is accessed together frequently. On the other hand, relating data through references may be preferred for maintaining data normalization, facilitating updates, and ensuring consistency, especially for information that changes independently.

Dynamic Data Model Adjustments

The flexibility to adapt data models in response to evolving application requirements is a significant advantage of document-relational databases. An illustrative scenario is adjusting how sensitive information, like credit card details, is handled to enhance security. By setting a different time to live (TTL) for credit card information compared to the order information, developers can ensure that sensitive data is automatically purged from the system after a predetermined period, mitigating potential security risks.

This adaptability underscores the importance of not being rigidly tied to a single data modeling strategy but instead being prepared to evolve the model as application requirements and external conditions change.

Developer Autonomy in Data Modeling

Fauna, and similar document-relational databases, offer developers the autonomy to choose the most appropriate data modeling approach for their specific use case. Rather than prescribing a one-size-fits-all solution, these platforms provide the tools and capabilities for developers to make informed decisions based on their application’s unique requirements and the trade-offs between different data modeling strategies.

The efficiency of traversing relationships in document-relational databases, such as Fauna, illustrates a significant advantage over traditional NoSQL approaches. The ability to fetch related documents in a single call without necessitating multiple database round trips is a testament to the advanced capabilities of these systems. This efficiency is not just about reducing the number of operations but also about simplifying data retrieval, making it more intuitive and aligned with how developers conceptualize data relationships.

Enhancing Data Security with TTL and Hashing

Implementing Time To Live (TTL) on sensitive data like credit card information represents a proactive approach to data security within document-relational databases. The flexibility to apply TTL at a granular level—specifically to credit card data rather than the entire document or collection—underscores the sophisticated data management capabilities of these platforms. Furthermore, the move to hash credit card information elevates the security measures, ensuring that sensitive data is stored responsibly and in compliance with best practices.

Performance Considerations in Data Traversal

A common query regarding the use of foreign keys or references in NoSQL databases concerns the potential performance implications. Traversing these relationships, especially in a document-relational context, might intuitively seem to introduce overhead compared to direct embedding. However, databases like Fauna are designed to efficiently manage these traversals, ensuring that the performance impact is minimized.

The key lies in the database’s underlying architecture, such as distributed transaction engines, which optimize query execution across distributed data sets. While there might be scenarios where traversing relationships incurs slight overhead compared to accessing embedded data, the benefits in terms of flexibility, data integrity, and maintainability often outweigh these considerations.

Dynamic Data Model Adaptation

The scenario of adjusting data models, such as segregating credit card information with distinct TTL settings, highlights the dynamic nature of document-relational databases. The fact that queries remain unchanged even after such modifications demonstrates the agility of these databases. Developers can adapt data models to evolving application requirements or external compliance demands without overhauling the entire application logic. This flexibility is crucial for maintaining the longevity and relevance of applications in a fast-paced technological landscape.

Streamlining Application Development

By abstracting the complexity of data relationships and management into more straightforward, efficient operations, document-relational databases empower developers to focus on building feature-rich, user-centric applications. The underlying database architecture handles the intricacies of data consistency, security, and relationship traversal, freeing developers from these concerns. This streamlined approach accelerates development cycles and reduces the cognitive load on developers, enabling them to deliver solutions that are both robust and secure.

The Future of Data Modeling

As document-relational databases continue to evolve, the paradigm of data storage and retrieval is shifting towards a model that combines the best of both worlds: the scalability and flexibility of NoSQL with the relational integrity and ease of use traditionally associated with SQL databases. This evolution promises to redefine how applications are developed, deployed, and scaled, offering a glimpse into the future of database technology where efficiency, security, and adaptability are paramount.

Serverless and Scalable Architecture

The document-relational database model excels in its serverless architecture, eliminating traditional database management hassles. This model automates scaling, allowing usage-based consumption without the overhead of managing servers or specifying cluster sizes. It stands out for its dynamic adaptability to workload variations, ensuring cost-effectiveness and optimal resource utilization.

Global Data Distribution Simplified

These databases shine in global data distribution, removing the complexity typically associated with making applications region-aware. By ensuring consistent data across all regions, document-relational systems enable seamless application deployment in multiple locations, mitigating concerns around data inconsistency or duplication.

Leveraging Server-Side Functions

Similar to stored procedures in relational databases, server-side functions in document-relational databases facilitate complex logic execution within the database layer. This capability, exemplified by Fauna’s use of the Fauna Query Language (FQL), enhances data processing efficiency by minimizing external data handling needs.

Enhanced Security and Access Control

Particularly useful in scenarios like e-commerce, server-side functions offer a secure way to encapsulate critical processes. By limiting direct access to sensitive collections and documents, these functions help maintain stringent security standards and ensure that operations conform to business logic.

Integration and Type-Checking

The integration of schema management into the CI/CD pipeline represents a forward leap in database schema evolution. With the ability to manage and deploy schema changes alongside application updates, document-relational databases align with modern development practices, ensuring agility and consistency.

Functions within these databases undergo type checking upon creation and updates, boosting reliability. This process streamlines operations, particularly for high-frequency tasks, by reducing the overhead associated with query parsing and transmission.

Gradual Schema Adoption

The approach of starting without a rigid schema allows for an organic evolution of the data model. This adaptability is crucial in the early stages of development, supporting rapid iteration without the constraints of a fixed schema. Over time, as the application’s data requirements become clearer, introducing structured schema elements enhances performance and maintains cost efficiency.

Upcoming Features and Consistency Mechanisms

With features like computed fields and field-level schema enforcement on the horizon, document-relational databases like Fauna are poised to offer even greater control over data integrity and structure, catering to the evolving needs of modern applications.

The Distributed Transaction Engine (DTE) plays a pivotal role in achieving global consistency across regions, a task managed with precision in Fauna’s architecture. This consistency ensures that every transaction is uniformly applied, demonstrating the advanced capabilities of document-relational databases in managing distributed data.

Conclusion

In wrapping up our exploration of document-relational databases, we’ve navigated through a transformative approach that merges NoSQL’s scalability with the relational integrity of traditional databases. This hybrid model, as demonstrated by platforms like Fauna, marks a significant step forward in database technology, tailored for the complexities of modern, distributed applications.

The advancements we’ve discussed, from serverless architectures to the nuanced flexibility in data modeling, are paving the way for a new era in data management. These developments promise to enhance the efficiency, security, and adaptability of databases, addressing the evolving needs of developers and businesses alike.

Looking ahead, the continuous evolution of document-relational databases, with upcoming features like computed fields and schema enforcement, promises even greater possibilities. This innovative approach to databases is set to redefine application development, enabling more robust, scalable, and dynamic solutions.

Thank you for joining this exploration. The future of document-relational databases holds exciting potential, and I eagerly anticipate the innovative applications and solutions we will build on this foundational technology. For more insights or queries, feel free to connect with me, Kirk Kirkconnell, via Community Slack.

Questions & Answers:

Q1: With global support in Fauna, how much latency can be expected?

The latency in Fauna with global support varies, and the specifics can be found on a dedicated support page. Fauna publishes these latency figures in near real-time on their website. The exact URL for this information would be part of Fauna’s support resources.

Q2: Can you specify the level of consistency for each operation in Fauna?

In Fauna, the operations are strongly consistent by default. Unlike many other NoSQL databases where you can specify the level of consistency for each call, in Fauna, this is not an option. You will always get strong consistency for every operation.

Q3: How do partition keys work in searches? Can you share some insights on this part?

In most cases, searches in Fauna (or any database) are not conducted directly by IDs, as users typically don’t know these IDs. Instead, you would use an index to search through data. For instance, if looking up a user by name, you would utilize an index where the name is a part of. Once you have located the desired data, you can navigate through its relationships, which will involve IDs. But these IDs, similar to the `_id` in MongoDB, are not usually the primary means of searching within Fauna.

Q4: Are UUIDs in Fauna only hashed, or are they decrypted as well?

In Fauna, the ID in a document (which can be considered a UUID) is hashed, but there’s nothing to decrypt. It is simply the unique identifier of that document and is not meant to be decrypted as it doesn’t contain encrypted data.