NoSQL and SQL databases

NoSQL or Not Only SQL started as a catchy Twitter hashtag to promote a meetup on non-relational databases. But when the non-relational database is preferable over a relational database? In this post, I’ll cover the document databases and leave the graph databases (neo4j) for another time.

Document database

Document database offers more schema flexibility as it follows a tree structure similar to JSON, avoiding the need to translate between the application data model and database (sometimes called impedance mismatch). Faster reading due to locality as a one-to-many relationship is stored in the same document.

Relational database

The relational database offers better support for joins, many-to-one, and many-to-many relationships as the data is laid out in the open: A relation(table) that contains a collection of tuples (rows), there’s no nested structure.

Schema flexibility

Document databases are also called schemaless, which can be misleading, as it could be compared to dynamic (runtime) type checking vs. static (compile) type checking.

This means the application would validate the data model or ensure that old fields are read when the document schema is updated. This is known as schema-on-read.

Schema-on-read means the struct of the data is implicit and only interpreted when the data is read.

In contrast, updating a table on a relational database would require running an ALTER migration and has a bad reputation of being slow and requiring downtime, known as schema-on-write. You would need to fix all compiler issues before building the application binary.

Schema-on-write means is the schema is explicit, and the database ensures all written data conform to it.

Normalization and Denormalization

Denormalization is the duplication of data across multiple records.

A good example is the nationality of a user. Storing the name, let’s say Brazilian, means the document is recording human meaningful information. That means as new users registers, the document size will become larger because of duplicated information.

Storing this information in a different document is known as normalization, and the nationality field contains only the reference document. The advantages are that it provides consistent style and spelling across profiles and is easy to update as the human meaningful information is recorded in one place.

The application can even keep this information in memory to avoid joins.

Relational databases use a foreign key to represent many-to-one and many-to-many relationships. A unique identifier references the related item.

Which one to pick?

Document databases works better with one-to-many relationships;
Relational databases works better with one-to-many and many-to-one. The many-to-many is acceptable;
Graph databases is recommended for many-to-many relationships;

References

Designing Data-Intensive Applications, chapter 02