JSON performance: PostgreSQL vs MongoDB Comparison

March 12, 2024

Reading Time: 13 minutes

How about more insights? Check out the video on this topic.

In the ever-evolving landscape of database technologies, understanding the performance nuances of JSON data handling in PostgreSQL and MongoDB is crucial for developers and database administrators. A recent webinar shed light on this topic, offering valuable insights into how these popular databases manage JSON data. This article distills the key points from the webinar, presenting a clear, concise comparison of JSON performance in PostgreSQL and MongoDB.

Introduction to the Webinar Speakers:

The webinar featured Umair Shahid, a seasoned expert in PostgreSQL and Founder at Stormatics with over two decades of experience, and Semab Tariq, a skilled database developer at Stormatics, tasked with the crucial role of conducting the benchmarks. Their comprehensive analysis provided a foundation for understanding the complexities of JSON data performance across MongoDB and PostgreSQL.

The Rising Popularity of PostgreSQL and MongoDB

Drawing on data from DB-Engines and annual surveys by Stack Overflow, Umair presented a five-year trend chart showcasing the shifting landscape of database popularity among developers. Surveys highlight PostgreSQL’s consistent ascent in popularity, signaling its growing favor among users. Despite starting behind MySQL in 2019, PostgreSQL has steadily gained ground since then. Notably, MongoDB also experienced a rise in popularity, albeit with a minor decline between 2022 and 2023, indicating dynamic shifts in the database landscape.

MongoDB: Breaking the Traditional SQL Mold

MongoDB’s inception in February 2009 marked a significant departure from traditional SQL databases. Designed as a document-oriented database, MongoDB excels in managing large volumes of unstructured data, offering scalability through built-in sharding technology. Its ease of use in application development, coupled with the BSON (binary JSON) data format, positions MongoDB as a flexible choice for developers handling rapidly evolving data needs without the constraints of rigid schema structures.

PostgreSQL: Early Innovations in Unstructured Data Handling

Interestingly, PostgreSQL’s foray into unstructured data handling predates MongoDB, with the introduction of the HSTORE data type for key-value pairs without schema requirements. The subsequent releases of JSON and JSONB data types in PostgreSQL versions 9.2 and 9.4, respectively, further cemented its capabilities in managing unstructured data alongside relational data. These developments highlight PostgreSQL’s adaptability and its dual strengths in handling both relational and unstructured data within a single technological framework.

Understanding JSON: The Backbone of Data Interchange

The versatility and accessibility of JSON (JavaScript Object Notation) make it an indispensable part of modern software development. In the realm of database technologies, its importance cannot be overstated, serving as a bridge between various platforms, languages, and technologies. This section aims to demystify JSON, providing clarity on its structure, popularity, and standardization, which underpins its extensive use in PostgreSQL and MongoDB alike.

JSON’s Structure: Simplicity and Clarity

JSON’s design philosophy prioritizes readability and ease of use. Its structure, composed of key-value pairs, mirrors the simplicity of plain English, making it both human-readable and machine-parsable. This dual accessibility ensures that JSON serves as an effective medium for data interchange across diverse technological landscapes. A typical JSON object, illustrating a student profile, showcases this intuitive structure, highlighting the format’s utility in representing complex data in an easily understandable format.

The Driving Forces Behind JSON’s Widespread Acceptance

Several factors contribute to JSON’s widespread adoption: Cross-platform Compatibility:

^JSON's text-based nature and clear syntax enable seamless data interchange across different technologies and languages, from Python to Ansible, facilitating communication between disparate systems.
^Human and Machine Readability: The format's straightforward key-value pairing is easily interpretable by humans and effortlessly parsed by machines, ensuring efficient data processing and interpretation.
^Language Independence: Its independence from any specific programming language enhances JSON's versatility, supported by built-in functions in many programming environments.
^Diverse Data Type Support: JSON accommodates a wide array of data types, including strings, numbers, arrays, and booleans, making it adaptable to various data representation needs.

JSON Standards and Benchmarking Methodology

Standardization: The JSON standard, defined by RFC 8259 (Internet Engineering Task Force) and ECMA-404 (European Computer Manufacturers Association), specifies syntax rules, data types, and Unicode encoding, ensuring uniformity across implementations.

Delving Into JSON Syntax

The webinar further explored JSON’s syntax through a detailed example, highlighting its structural elements:

^Objects: Enclosed in curly braces, objects can contain nested structures, enhancing the format's ability to represent complex data hierarchies.
^Arrays: Encapsulated within square brackets, arrays support the inclusion of lists within JSON objects.
^Value Representation: JSON supports a variety of data types for values, including strings (enclosed in double quotes), numbers, and booleans, with each key-value pair separated by a colon.
^Syntax Nuances: Commas separate elements, while whitespace, though not required for machine parsing, improves readability for human users.

Benchmarking JSON Performance in PostgreSQL vs MongoDB: A Technical Overview

This section of the article distills Semab Tariq’s presentation, offering a clear, informative look into the technical setup and execution of the benchmarks PostgreSQL vs MongoDB that aimed to objectively compare these two leading database technologies.

Benchmarking Environment Setup

The benchmarking process was meticulously planned, with a keen eye on creating a fair, reproducible environment:

^Hardware: The tests were conducted on an AWS EC2 instance, specifically a t3.xlarge model. This instance was equipped with 4 CPU cores, 16 GBs of RAM, and 500 GBs of Elastic Block Store (EBS) storage.
^Operating System: Utilized Ubuntu 22.04 as the operating system, ensuring a stable and reliable environment for our databases.
^Database Configuration: Both PostgreSQL 16.1 and MongoDB 7.0.3 databases were set up to run concurrently on this system. The parallel setup was established to ensure consistency in the test conditions for both databases.
^Benchmarking Tool: The tool utilized for conducting these tests is called the pg_nosql_benchmark, initially developed by EnterpriseDB. This tool is specifically designed to assess and contrast the performance of NoSQL features across databases.

About the Benchmarking Tool

The PG NoSQL Benchmark tool, originally developed by EnterpriseDB, was chosen for its relevance, despite its age. Modifications were necessary to update the tool for compatibility with the latest database versions, including code adjustments, updating deprecated functions, and query rewrites to optimize for index performance in PostgreSQL.

Designing the Benchmark Data Model

The benchmark’s data model was crafted to simulate a real-world scenario: an inventory catalog of electronic products.

^Table and Collection Setup: In both PostgreSQL and MongoDB, we set up a 'json_table'. In PostgreSQL, this table contained a single column named 'data' of the JSONB data type. Similarly, in MongoDB, we had a collection with the same name.
^Data Generation and Structure: Each row in PostgreSQL and each document in MongoDB included keys such as 'name', 'brand', 'type', 'price', 'warranty_years', 'availability' (a Boolean), and 'description' (a string).

Sample Data Structure

This structure is designed to test the databases’ ability to handle various data types within JSON objects, from strings and floats to booleans. A sample entry in this model provides a tangible example of the data used in the benchmarks, illustrating the complexity and richness of the information being processed.

Querying Strategy and Index Use

We also designed SELECT queries to evaluate retrieval performance. The use of btree and GIN indexes in PostgreSQL and single-field indexes in MongoDB was strategically chosen to optimize query execution. An example query retrieving data where the brand name is ‘ACME’ in a database with 0.5 million rows could potentially return about 45K rows.

Analyzing Benchmark Results: PostgreSQL vs MongoDB in JSON Performance

The rigorous benchmarking process, as detailed by Semab Tariq, culminates in an insightful analysis of JSON performance between PostgreSQL and MongoDB. The benchmarks were conducted with a commitment to fairness, using the databases as they are provided “out of the box” without any parameter tuning. This approach ensures that the performance insights derived from the study accurately reflect the default capabilities of PostgreSQL and MongoDB, offering valuable guidance for developers working with these databases in typical scenarios.

The first set of results pertains to bulk loading, a scenario that simulates the initial population of a database with a large volume of data. The performance of both databases was measured across increments of 0.25 million rows, up to a total of 1.75 million rows. The findings reveal a linear increase in the time required to ingest data for both databases, a result that aligns with expectations given the nature of bulk loading operations.

Performance Analysis: The differences in performance were marginal, making it difficult to declare a definitive winner. However, PostgreSQL exhibited a slight edge in ingesting large amounts of data in bulk compared to MongoDB.

Single Insert Operations: Surprising Efficiency of PostgreSQL

The second set of results focuses on single insert operations, reflecting more common application behaviors where data is inserted or updated incrementally. Contrary to expectations, PostgreSQL demonstrated significantly better performance in processing single inserts of JSON data compared to MongoDB. This result is particularly striking, given the logarithmic scale of the graph used to present the findings, indicating that the actual performance gap is even more pronounced than it appears.

This superior performance of PostgreSQL in single insert scenarios suggests that its optimization for JSON data, combined with its robust indexing and storage mechanisms, allows it to outperform MongoDB in tasks that closely mimic real-world application data handling.

Disk Space Utilization Comparison

The third key test in the benchmark series focused on evaluating how PostgreSQL and MongoDB manage disk space following the ingestion of JSON data. The findings reveal a notable difference in the efficiency of data storage between the two databases:

^Data Growth: We observed that data growth on disk was linear in PostgreSQL. In contrast, MongoDB displayed superior performance, benefiting from its specific compression algorithms.
^Compression Analysis: MongoDB’s compression techniques effectively reduce the on-disk data size, giving it an advantage in terms of storage efficiency.
^PostgreSQL's Efficiency: In terms of data ingestion, PostgreSQL showed more efficiency, particularly in bulk data loading scenarios.
^MongoDB's Storage Optimization: MongoDB excelled in storing data, thanks to its effective compression algorithms, making it a more suitable option for scenarios where disk space utilization is a priority.

Select Performance Analysis: PostgreSQL vs MongoDB

As we progressed through our results, it became clear that data ingestion was only one part of the story. Speakers also delved into the realm of data selection, which offered intriguing insights into the performance of PostgreSQL and MongoDB. The focus centered around a select query, where they anticipated approximately 45,000 rows to be returned from a dataset of half a million rows. The resulting graph painted an interesting picture of how the two databases handle data retrieval.

Understanding Select Query Performance

Performance Insights:

^PostgreSQL's Performance: In scenarios with lower data volumes, PostgreSQL demonstrated commendable efficiency. However, as the dataset size increased, we can observed a noticeable decline in its performance. This phenomenon became particularly evident when the data exceeded half a million rows.
^MongoDB's Consistency: Contrarily, MongoDB showcased consistent select performance, irrespective of the dataset's size. Whether dealing with a few rows or millions, MongoDB's ability to maintain steady performance was remarkable.

Index Performance in PostgreSQL

Further investigation into PostgreSQL’s select query performance involved an analysis of index types and their effectiveness over varying dataset sizes:

^btree Index: The standard indexing method in PostgreSQL showed a predictable performance pattern, gradually declining as data volume increased.
^Parallel Seq Scan: Representing a baseline for full-table scans, the performance of parallel sequential scans remained relatively stable, offering a point of comparison for other index types.
^GIN (Generalized Inverted Index): Specifically designed for document data types and expected to excel with JSON data, GIN indexes initially performed well. Surprisingly, their efficiency plummeted after reaching the 1.25 million data volume mark, underperforming even compared to sequential scans. This anomaly suggests that GIN indexes, despite their intended advantage for JSON data, encounter scalability or optimization challenges at higher data volumes.

Next Steps and Future Research Directions

This aspect of our study, focusing on selection performance, raised several questions and provided us with avenues for further investigation. Our aim moving forward is to delve deeper into these findings, particularly the behavior of GIN indexes in PostgreSQL under large data loads. These insights point towards potential areas for enhancement in PostgreSQL’s handling of JSON data.

Conclusion

Choosing between PostgreSQL and MongoDB for JSON data handling should be informed by specific application requirements, including data ingestion rates, storage efficiency, and select query performance. The webinar underscored the need for ongoing research and optimization to address the identified challenges, particularly in scaling PostgreSQL’s select query performance. As database technologies continue to advance, empirical analyses like this webinar provide critical insights that guide the development and optimization of database systems for modern applications.

Watch the Full Webinar: Click Here
Read the Whitepaper “Data Ingestion Benchmark Comparison between PostgreSQL and MongoDB” by Stormatics

We extend our heartfelt gratitude to the speakers Umair Shahid and Semab Tariq for their insightful contributions to this discussion. Join us, Document Database Community on Slack and share your comments below.

0 Comments

Back to the future: Scaling infrastructure in a modern cloud world

« Older Entries

DDC

March 12, 2024

Introduction to the Webinar Speakers:

The Rising Popularity of PostgreSQL and MongoDB

MongoDB: Breaking the Traditional SQL Mold

PostgreSQL: Early Innovations in Unstructured Data Handling

Understanding JSON: The Backbone of Data Interchange

JSON’s Structure: Simplicity and Clarity

The Driving Forces Behind JSON’s Widespread Acceptance

JSON Standards and Benchmarking Methodology

Delving Into JSON Syntax

Benchmarking JSON Performance in PostgreSQL vs MongoDB: A Technical Overview

Benchmarking Environment Setup

About the Benchmarking Tool

Designing the Benchmark Data Model

Sample Data Structure

Querying Strategy and Index Use

Analyzing Benchmark Results: PostgreSQL vs MongoDB in JSON Performance

Single Insert Operations: Surprising Efficiency of PostgreSQL

Disk Space Utilization Comparison

Select Performance Analysis: PostgreSQL vs MongoDB

Understanding Select Query Performance

Index Performance in PostgreSQL

Next Steps and Future Research Directions

Conclusion

0 Comments

Related posts

Back to the future: Scaling infrastructure in a modern cloud world

The importance of interoperability and compatibility in database systems

NoSQL: Why and When to Use It

Data Visualization Difficulties in Document Databases

Redis Alternatives Compared: What Are Your Options in 2024?

MongoDB Cluster Provisioning in Kubernetes: Deep Dive Demo with Diogo Recharte

How to provision a MongoDB cluster in Kubernetes: Peter Szczepaniak’s Tips

Elevating Disaster Recovery With Kubernetes-native Document Databases (part 2)

Elevating Disaster Recovery With Kubernetes-native Document Databases (part 1)

Global NoSQL Benchmark Framework: Embracing Real-World Performance

Subscribe to Updates

Success!