Relational vs NoSQL databases
Relational databases
As a data engineer, this is the most common type of databases you will work with.
A relational database is a type of database that organizes data into structured tables (also called relations) with rows and columns. It's based on the relational model of data, which uses a structured approach to store, manage, and retrieve information.
Key Attributes of Relational Databases
1. Tabular Structure
Data is organized in tables with predefined rows and columns
Each table represents a specific entity (e.g., customers, orders, products)
Rows represent individual records
Columns represent attributes or fields
2. Predefined Schema
Requires a fixed schema defined before data entry
Schema specifies table structure, data types, and constraints
Changes to schema require careful planning and migration
3. Relationships Between Tables
Tables can be linked through relationships using keys
Primary keys uniquely identify each record in a table
Foreign keys create connections between tables
Supports one-to-one, one-to-many, and many-to-many relationships
4. SQL (Structured Query Language)
Uses SQL as the standard query language
Provides consistent syntax across different relational database systems
Enables complex queries, joins, and data manipulation
5. ACID Compliance
Atomicity: Transactions are all-or-nothing
Consistency: Data remains valid according to defined rules
Isolation: Concurrent transactions don't interfere with each other
Durability: Completed transactions are permanently saved
6. Strong Consistency
Ensures all users see the same data at the same time
Read operations always return the most recent write
Prioritizes accuracy over availability
7. Vertical Scaling
Traditionally scales by increasing server capacity (more CPU, RAM, storage)
Horizontal scaling is possible but more complex than in NoSQL
8. Data Integrity and Constraints
Enforces rules through constraints (NOT NULL, UNIQUE, CHECK)
Maintains referential integrity between related tables
Validates data before insertion
9. Normalization
Data is typically normalized to reduce redundancy
Minimizes data duplication across tables
Improves data consistency and storage efficiency
Common Examples
Popular relational database management systems (RDBMS) include:
PostgreSQL
MySQL
Oracle Database
Microsoft SQL Server
SQLite
NoSQL Databases: A Comprehensive Overview
What is NoSQL?
NoSQL stands for "Not Only SQL". It represents a category of databases that diverge from the traditional relational framework. Despite the name, some non-relational databases still support SQL or SQL-like query languages.
Data Structure
NoSQL databases utilize non-tabular structures and can accommodate various data formats:
Key-value
Document
Wide-column
Graph
Other specialized formats
Schema Flexibility
Unlike relational databases, NoSQL databases don't require predefined schemas, providing greater flexibility in how you structure and store your data.
Horizontal Scaling
NoSQL databases excel at horizontal scaling, automatically distributing data and workloads across multiple servers to handle increased traffic demands efficiently.
Data Consistency in Distributed Systems
How Write Operations Work
When writing data to a distributed NoSQL database:
The write operation is first performed on a single node (one location where a database instance is running)
A slight delay occurs before changes propagate to all other nodes in the system
Consistency Modes
Eventual Consistency (NoSQL)
Allows reading from nodes that haven't received the latest updates; guarantees consistency given sufficient time
Immediate read access, may return slightly outdated data
Social media platforms, content distribution networks, applications prioritizing speed and availability
Strong Consistency (Relational)
Blocks read operations until all nodes are updated with the latest write
Delayed read access, always returns current data
Applications requiring real-time accuracy and data integrity
Key Advantage of Eventual Consistency: Prioritizes speed and system availability over real-time consistency, making it ideal for applications where these factors outweigh the need for immediate data accuracy.
ACID Compliance
Not all NoSQL databases guarantee ACID compliance, though some do provide it (e.g., MongoDB).
Important consideration: The lack of ACID compliance means you may need to implement additional steps to ensure data integrity when sourcing data from NoSQL databases.
Query Languages
NoSQL databases typically use specialized query languages tailored to their specific data models. While these are often different from SQL, they're not always entirely distinct—some NoSQL databases support SQL-like syntax.
Last updated