Schema-on-Read and Schema-on-Write

Schema-on-Read vs Schema-on-Write

Schema-on-Write (Traditional relational databases):

Schema is enforced when data is written to the database
The database validates and rejects data that doesn't match the schema
All data must conform to a single, predefined structure
"Write on schema" - you must know the structure before storing data

Schema-on-Read (Document/NoSQL databases):

Schema is interpreted when data is read from the database
The database stores data without validation
Data can have varying structures; the application interprets it
"Schemaless" or "flexible schema" - structure determined at query time

Visual Comparison

Schema-on-Write (SQL):
User → Application → [Schema Validation] → Database
                           ↑ (rejects invalid data)
                     Structure enforced here

Schema-on-Read (NoSQL):
User → Application → Database (stores anything)
                ↑
         Structure interpreted here when reading

Schema-on-Write

Strict Structure Enforcement

-- Define the schema upfront
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) NOT NULL,
    age INTEGER CHECK (age >= 0),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- This INSERT succeeds
INSERT INTO users (name, email, age) 
VALUES ('Alice', 'alice@example.com', 30);

-- This INSERT fails - violates NOT NULL constraint
INSERT INTO users (name) 
VALUES ('Bob');
-- ERROR: null value in column "email" violates not-null constraint

-- This INSERT fails - violates CHECK constraint
INSERT INTO users (name, email, age) 
VALUES ('Charlie', 'charlie@example.com', -5);
-- ERROR: new row violates check constraint "users_age_check"

-- This INSERT fails - unknown column
INSERT INTO users (name, email, age, phone) 
VALUES ('David', 'david@example.com', 25, '555-0100');
-- ERROR: column "phone" of relation "users" does not exist

import psycopg2

def create_user_schema_on_write(conn, user_data):
    """Schema-on-write: Database enforces structure"""
    cursor = conn.cursor()
    
    try:
        # Database validates this data against the schema
        cursor.execute("""
            INSERT INTO users (name, email, age)
            VALUES (%(name)s, %(email)s, %(age)s)
            RETURNING id
        """, user_data)
        
        user_id = cursor.fetchone()[0]
        conn.commit()
        return user_id
        
    except psycopg2.IntegrityError as e:
        # Database rejected the data
        conn.rollback()
        print(f"Schema violation: {e}")
        return None

# Example usage
user1 = {'name': 'Alice', 'email': 'alice@example.com', 'age': 30}
user2 = {'name': 'Bob', 'age': 25}  # Missing email - will fail

create_user_schema_on_write(conn, user1)  # ✓ Success
create_user_schema_on_write(conn, user2)  # ✗ Fails at write time

Characteristics

Structure first: You define tables, columns, types, and constraints upfront
Validation at write time: Invalid data is rejected immediately
Uniform data: All records have the same structure
Schema changes are expensive: Requires migrations that affect all existing data

Advantages

Data integrity: Guarantees all data is valid and consistent
Catches errors early: Invalid data is rejected immediately
Query optimization: Database knows exact structure, can optimize queries
Documentation: Schema serves as explicit documentation
Type safety: Enforces data types strictly

Disadvantages

Rigid: Changing schema requires migrations
Downtime risk: Schema changes can lock tables
Slower iteration: Must plan schema changes carefully
Handles variability poorly: Difficult to store heterogeneous data

Common in

Relational databases (PostgreSQL, MySQL, Oracle)
Traditional SQL databases
Systems prioritizing consistency and integrity

Schema-on-Read

Flexible Structure Storage

from pymongo import MongoClient

def create_user_schema_on_read(collection, user_data):
    """Schema-on-read: Database stores anything"""
    # MongoDB accepts any structure without validation
    result = collection.insert_one(user_data)
    return result.inserted_id

# Setup
client = MongoClient('mongodb://localhost:27017/')
db = client['myapp']
users = db['users']

# All of these succeed - different structures!
user1 = {
    'name': 'Alice',
    'email': 'alice@example.com',
    'age': 30
}

user2 = {
    'name': 'Bob',
    'age': 25
    # Missing email - but that's fine!
}

user3 = {
    'name': 'Charlie',
    'email': 'charlie@example.com',
    'age': 28,
    'phone': '555-0100',  # Extra field - also fine!
    'address': {
        'street': '123 Main St',
        'city': 'Boston'
    }
}

user4 = {
    'username': 'dave123',  # Different field name
    'contact': 'dave@example.com',
    'profile': {
        'bio': 'Software developer',
        'interests': ['coding', 'music']
    }
}

# All inserts succeed - schema enforced at READ time
create_user_schema_on_read(users, user1)  # ✓
create_user_schema_on_read(users, user2)  # ✓
create_user_schema_on_read(users, user3)  # ✓
create_user_schema_on_read(users, user4)  # ✓

Reading with Schema Interpretation

def get_user_email_schema_on_read(collection, user_id):
    """Application interprets schema when reading"""
    user = collection.find_one({'_id': user_id})
    
    if not user:
        return None
    
    # Application code handles different structures
    # This is where "schema-on-read" happens
    
    if 'email' in user:
        return user['email']
    elif 'contact' in user:
        return user['contact']
    elif 'profile' in user and 'email' in user['profile']:
        return user['profile']['email']
    else:
        return 'no-email@example.com'  # Default

def get_all_users_with_schema_handling(collection):
    """Handle mixed schemas in the same collection"""
    users = []
    
    for doc in collection.find():
        # Normalize different structures
        user = {
            'id': str(doc['_id']),
            'name': doc.get('name') or doc.get('username', 'Unknown'),
            'email': doc.get('email') or doc.get('contact', 'N/A'),
            'age': doc.get('age'),
            'phone': doc.get('phone'),
        }
        users.append(user)
    
    return users

Schema Evolution Over Time

# Version 1: Initial user structure (January 2024)
user_v1 = {
    'name': 'Alice',
    'email': 'alice@example.com'
}
users.insert_one(user_v1)

# Version 2: Added age field (March 2024)
user_v2 = {
    'name': 'Bob',
    'email': 'bob@example.com',
    'age': 25
}
users.insert_one(user_v2)

# Version 3: Added nested address (June 2024)
user_v3 = {
    'name': 'Charlie',
    'email': 'charlie@example.com',
    'age': 28,
    'address': {
        'city': 'Boston',
        'country': 'USA'
    }
}
users.insert_one(user_v3)

# Version 4: Restructured contact info (September 2024)
user_v4 = {
    'name': 'David',
    'contact': {
        'email': 'david@example.com',
        'phone': '555-0100'
    },
    'age': 32,
    'address': {
        'city': 'Seattle',
        'country': 'USA'
    }
}
users.insert_one(user_v4)

# All versions coexist in the same collection!
# The database doesn't enforce consistency

def get_user_safely(collection, name):
    """Application handles all schema versions"""
    user = collection.find_one({'name': name})
    
    # Schema interpretation logic
    if 'contact' in user and isinstance(user['contact'], dict):
        email = user['contact'].get('email', 'N/A')
    else:
        email = user.get('email', 'N/A')
    
    return {
        'name': user['name'],
        'email': email,
        'age': user.get('age', 'Unknown'),
        'city': user.get('address', {}).get('city', 'Unknown')
    }

Characteristics

Write flexibility: Can store any structure without predefinition
Validation at read time: Application handles interpretation and validation
Heterogeneous data: Different records can have different structures
Schema evolution is easy: Just start writing new formats

Advantages

Flexibility: Easy to add new fields or change structure
Rapid iteration: No migrations needed for schema changes
Handles evolution: Old and new data formats coexist naturally
Semi-structured data: Great for varying or unpredictable structures
No downtime: Schema changes don't require database operations

Disadvantages

No guarantees: Data might be invalid, incomplete, or inconsistent
Errors appear late: Problems discovered during reads, not writes
Application complexity: Validation logic scattered in application code
Harder to optimize: Database can't assume structure for query optimization
Documentation burden: Schema exists implicitly in code

Common in

Document databases (MongoDB, CouchDB)
Key-value stores (Redis with JSON)
Column-family stores (Cassandra)
Data lakes and big data systems

Golang Example: Comparing Both Approaches

Schema-on-Write (PostgreSQL)

package main

import (
    "database/sql"
    "fmt"
    _ "github.com/lib/pq"
)

type User struct {
    ID    int
    Name  string
    Email string
    Age   int
}

func createUserSchemaOnWrite(db *sql.DB, user User) error {
    // Database enforces schema constraints
    query := `
        INSERT INTO users (name, email, age)
        VALUES ($1, $2, $3)
        RETURNING id
    `
    
    err := db.QueryRow(query, user.Name, user.Email, user.Age).Scan(&user.ID)
    if err != nil {
        // Schema violation caught at write time
        return fmt.Errorf("schema validation failed: %w", err)
    }
    
    fmt.Printf("User created with ID: %d\n", user.ID)
    return nil
}

func main() {
    db, _ := sql.Open("postgres", "postgres://localhost/mydb?sslmode=disable")
    defer db.Close()
    
    // This works
    user1 := User{Name: "Alice", Email: "alice@example.com", Age: 30}
    createUserSchemaOnWrite(db, user1)
    
    // This fails - email is required by schema
    user2 := User{Name: "Bob", Age: 25}
    err := createUserSchemaOnWrite(db, user2)
    if err != nil {
        fmt.Println("Rejected:", err)
    }
}

Schema-on-Read (MongoDB)

package main

import (
    "context"
    "fmt"
    "go.mongodb.org/mongo-driver/bson"
    "go.mongodb.org/mongo-driver/mongo"
    "go.mongodb.org/mongo-driver/mongo/options"
)

func createUserSchemaOnRead(collection *mongo.Collection, userData bson.M) error {
    // MongoDB accepts any structure
    _, err := collection.InsertOne(context.Background(), userData)
    return err
}

func getUserEmailSchemaOnRead(collection *mongo.Collection, name string) string {
    var result bson.M
    err := collection.FindOne(
        context.Background(),
        bson.M{"name": name},
    ).Decode(&result)
    
    if err != nil {
        return "N/A"
    }
    
    // Schema interpretation at read time
    if email, ok := result["email"].(string); ok {
        return email
    }
    
    if contact, ok := result["contact"].(bson.M); ok {
        if email, ok := contact["email"].(string); ok {
            return email
        }
    }
    
    return "no-email@example.com"
}

func main() {
    client, _ := mongo.Connect(
        context.Background(),
        options.Client().ApplyURI("mongodb://localhost:27017"),
    )
    defer client.Disconnect(context.Background())
    
    collection := client.Database("myapp").Collection("users")
    
    // All different structures - all accepted!
    user1 := bson.M{"name": "Alice", "email": "alice@example.com", "age": 30}
    user2 := bson.M{"name": "Bob", "age": 25} // Missing email
    user3 := bson.M{
        "name": "Charlie",
        "contact": bson.M{"email": "charlie@example.com"},
    }
    
    createUserSchemaOnRead(collection, user1)
    createUserSchemaOnRead(collection, user2)
    createUserSchemaOnRead(collection, user3)
    
    // Application handles different schemas when reading
    fmt.Println(getUserEmailSchemaOnRead(collection, "Alice"))   // alice@example.com
    fmt.Println(getUserEmailSchemaOnRead(collection, "Bob"))     // no-email@example.com
    fmt.Println(getUserEmailSchemaOnRead(collection, "Charlie")) // charlie@example.com
}

Handling Schema Changes

Schema-on-Write: Requires Migration

-- Must migrate all existing data
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- All rows now have the phone column (NULL or default value)
-- Database maintains consistency

def add_phone_schema_on_write(conn):
    """All data must conform to new schema"""
    cursor = conn.cursor()
    
    # Migration script
    cursor.execute("ALTER TABLE users ADD COLUMN phone VARCHAR(20)")
    
    # Optionally backfill data
    cursor.execute("UPDATE users SET phone = 'Unknown' WHERE phone IS NULL")
    
    conn.commit()

Schema-on-Read: No Migration Needed

def add_phone_schema_on_read(collection):
    """No migration needed - just update application code"""
    # Old documents don't have phone
    # New documents will have phone
    # Both coexist happily
    pass

def get_user_with_phone(collection, user_id):
    """Application handles both old and new formats"""
    user = collection.find_one({'_id': user_id})
    
    return {
        'name': user['name'],
        'email': user.get('email', 'N/A'),
        'phone': user.get('phone', 'Not provided')  # Handles missing field
    }

Key Differences Summary

Aspect

Schema-on-Write

Schema-on-Read

When validated

At write time

At read time

Where schema lives

In database

In application code

Data uniformity

All data matches schema

Data can vary in structure

Schema changes

Requires migrations

Just update application code

Error detection

Immediate (write fails)

Delayed (read time)

Data integrity

Guaranteed by database

Responsibility of application

Best for

Structured, consistent data

Semi-structured, evolving data

Hybrid approach

What's happening here:

This uses a relational database (PostgreSQL) but adds a JSON column for flexible data. The table structure is strict, but one column can hold any JSON.

Complete SQL Example

-- Create table with strict columns AND flexible JSON column
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,           -- Schema-on-write: must be integer
    total DECIMAL(10,2) NOT NULL,       -- Schema-on-write: must be decimal
    created_at TIMESTAMP DEFAULT NOW(), -- Schema-on-write: must be timestamp
    metadata JSONB                      -- Schema-on-read: can be anything!
);

-- Add indexes on structured columns (fast!)
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Can also index inside JSON (PostgreSQL feature)
CREATE INDEX idx_orders_gift_wrap ON orders((metadata->>'gift_wrap'));

Full Golang Example

package main

import (
    "database/sql"
    "encoding/json"
    "fmt"
    "time"
    
    _ "github.com/lib/pq"
)

type Order struct {
    ID        int
    UserID    int
    Total     float64
    CreatedAt time.Time
    Metadata  map[string]interface{}  // Flexible JSON data
}

func createOrderHybrid(db *sql.DB, order Order) error {
    // Convert metadata to JSON
    metadataJSON, err := json.Marshal(order.Metadata)
    if err != nil {
        return fmt.Errorf("failed to marshal metadata: %w", err)
    }
    
    // Insert with strict validation on core fields
    // but flexible metadata
    query := `
        INSERT INTO orders (user_id, total, metadata)
        VALUES ($1, $2, $3)
        RETURNING id, created_at
    `
    
    err = db.QueryRow(
        query,
        order.UserID,    // Must be valid integer (schema-on-write)
        order.Total,     // Must be valid decimal (schema-on-write)
        metadataJSON,    // Can be any JSON (schema-on-read)
    ).Scan(&order.ID, &order.CreatedAt)
    
    if err != nil {
        return fmt.Errorf("failed to create order: %w", err)
    }
    
    fmt.Printf("Created order ID: %d\n", order.ID)
    return nil
}

func main() {
    db, err := sql.Open("postgres", 
        "postgres://user:pass@localhost/mydb?sslmode=disable")
    if err != nil {
        panic(err)
    }
    defer db.Close()
    
    // Order 1: Basic metadata
    order1 := Order{
        UserID: 123,
        Total:  99.99,
        Metadata: map[string]interface{}{
            "gift_wrap": true,
            "gift_message": "Happy Birthday!",
        },
    }
    createOrderHybrid(db, order1)
    
    // Order 2: Different metadata structure
    order2 := Order{
        UserID: 456,
        Total:  149.50,
        Metadata: map[string]interface{}{
            "delivery_notes": "Leave at door",
            "delivery_time": "evening",
            "contact_phone": "555-0100",
        },
    }
    createOrderHybrid(db, order2)
    
    // Order 3: Complex nested metadata
    order3 := Order{
        UserID: 789,
        Total:  249.99,
        Metadata: map[string]interface{}{
            "gift_wrap": true,
            "delivery": map[string]interface{}{
                "instructions": "Ring doorbell twice",
                "preferred_time": "morning",
                "access_code": "1234",
            },
            "items": []map[string]interface{}{
                {"sku": "ABC123", "customization": "engraved"},
                {"sku": "DEF456", "customization": "none"},
            },
        },
    }
    createOrderHybrid(db, order3)
    
    // This FAILS - user_id is not an integer (schema-on-write enforcement)
    invalidOrder := Order{
        UserID: 0,  // Invalid
        Total:  50.00,
        Metadata: map[string]interface{}{
            "notes": "This will fail",
        },
    }
    err = createOrderHybrid(db, invalidOrder)
    if err != nil {
        fmt.Printf("Failed as expected: %v\n", err)
    }
}

The choice fundamentally reflects a trade-off between flexibility and safety—schema-on-write prioritizes correctness and consistency, while schema-on-read prioritizes adaptability and speed of change.

PreviousData Maturity NextData Idempotency

Last updated 2 months ago

hashtagSchema-on-Read vs Schema-on-Write

hashtagVisual Comparison

hashtagSchema-on-Write

hashtagCharacteristics

hashtagAdvantages

hashtagDisadvantages

hashtagCommon in

hashtagSchema-on-Read

hashtagCharacteristics

hashtagAdvantages

hashtagDisadvantages

hashtagCommon in

hashtagGolang Example: Comparing Both Approaches

hashtagSchema-on-Write (PostgreSQL)

hashtagSchema-on-Read (MongoDB)

hashtagHandling Schema Changes

hashtagSchema-on-Write: Requires Migration

hashtagSchema-on-Read: No Migration Needed

hashtagKey Differences Summary

hashtagHybrid approach

hashtagComplete SQL Example

hashtagFull Golang Example

Schema-on-Read vs Schema-on-Write

Visual Comparison

Schema-on-Write

Characteristics

Advantages

Disadvantages

Common in

Schema-on-Read

Characteristics

Advantages

Disadvantages

Common in

Golang Example: Comparing Both Approaches

Schema-on-Write (PostgreSQL)

Schema-on-Read (MongoDB)

Handling Schema Changes

Schema-on-Write: Requires Migration

Schema-on-Read: No Migration Needed

Key Differences Summary

Hybrid approach

Complete SQL Example

Full Golang Example