Relational vs NoSQL databases


Relational databases

As a data engineer, this is the most common type of databases you will work with.

A relational database is a type of database that organizes data into structured tables (also called relations) with rows and columns. It's based on the relational model of data, which uses a structured approach to store, manage, and retrieve information.

Key Attributes of Relational Databases

1. Tabular Structure

  • Data is organized in tables with predefined rows and columns

  • Each table represents a specific entity (e.g., customers, orders, products)

  • Rows represent individual records

  • Columns represent attributes or fields

2. Predefined Schema

  • Requires a fixed schema defined before data entry

  • Schema specifies table structure, data types, and constraints

  • Changes to schema require careful planning and migration

3. Relationships Between Tables

  • Tables can be linked through relationships using keys

  • Primary keys uniquely identify each record in a table

  • Foreign keys create connections between tables

  • Supports one-to-one, one-to-many, and many-to-many relationships

4. SQL (Structured Query Language)

  • Uses SQL as the standard query language

  • Provides consistent syntax across different relational database systems

  • Enables complex queries, joins, and data manipulation

5. ACID Compliance

  • Atomicity: Transactions are all-or-nothing

  • Consistency: Data remains valid according to defined rules

  • Isolation: Concurrent transactions don't interfere with each other

  • Durability: Completed transactions are permanently saved

6. Strong Consistency

  • Ensures all users see the same data at the same time

  • Read operations always return the most recent write

  • Prioritizes accuracy over availability

7. Vertical Scaling

  • Traditionally scales by increasing server capacity (more CPU, RAM, storage)

  • Horizontal scaling is possible but more complex than in NoSQL

8. Data Integrity and Constraints

  • Enforces rules through constraints (NOT NULL, UNIQUE, CHECK)

  • Maintains referential integrity between related tables

  • Validates data before insertion

9. Normalization

  • Data is typically normalized to reduce redundancy

  • Minimizes data duplication across tables

  • Improves data consistency and storage efficiency

Common Examples

Popular relational database management systems (RDBMS) include:

  • PostgreSQL

  • MySQL

  • Oracle Database

  • Microsoft SQL Server

  • SQLite


NoSQL Databases: A Comprehensive Overview

What is NoSQL?

NoSQL stands for "Not Only SQL". It represents a category of databases that diverge from the traditional relational framework. Despite the name, some non-relational databases still support SQL or SQL-like query languages.

Data Structure

NoSQL databases utilize non-tabular structures and can accommodate various data formats:

  • Key-value

  • Document

  • Wide-column

  • Graph

  • Other specialized formats

Schema Flexibility

Unlike relational databases, NoSQL databases don't require predefined schemas, providing greater flexibility in how you structure and store your data.

Horizontal Scaling

NoSQL databases excel at horizontal scaling, automatically distributing data and workloads across multiple servers to handle increased traffic demands efficiently.

Data Consistency in Distributed Systems

How Write Operations Work

When writing data to a distributed NoSQL database:

  1. The write operation is first performed on a single node (one location where a database instance is running)

  2. A slight delay occurs before changes propagate to all other nodes in the system

Consistency Modes

Consistency Type
Implementation
Data Availability
Best Use Cases

Eventual Consistency (NoSQL)

Allows reading from nodes that haven't received the latest updates; guarantees consistency given sufficient time

Immediate read access, may return slightly outdated data

Social media platforms, content distribution networks, applications prioritizing speed and availability

Strong Consistency (Relational)

Blocks read operations until all nodes are updated with the latest write

Delayed read access, always returns current data

Applications requiring real-time accuracy and data integrity

Key Advantage of Eventual Consistency: Prioritizes speed and system availability over real-time consistency, making it ideal for applications where these factors outweigh the need for immediate data accuracy.

ACID Compliance

Not all NoSQL databases guarantee ACID compliance, though some do provide it (e.g., MongoDB).

Important consideration: The lack of ACID compliance means you may need to implement additional steps to ensure data integrity when sourcing data from NoSQL databases.

Query Languages

NoSQL databases typically use specialized query languages tailored to their specific data models. While these are often different from SQL, they're not always entirely distinct—some NoSQL databases support SQL-like syntax.


Last updated