CREATE TABLE


In Data Engineering, the CREATE TABLE statement is more than just naming columns; it is about defining the physical storage and rules that protect your data's integrity.

In PostgreSQL, a professional CREATE TABLE statement usually falls into one of two categories: Standard or Partitioned.


Anatomy of a Robust CREATE TABLE

When building production tables, we combine column definitions, constraints, and storage parameters.

CREATE TABLE users (
    -- 1. Identity Columns
    user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- 2. Data Columns with Constraints
    username VARCHAR(50) NOT NULL UNIQUE,
    email TEXT NOT NULL CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'),
    age INT DEFAULT 18,
    
    -- 3. Metadata for Data Engineering
    is_active BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
) 
WITH (fillfactor = 90); -- Storage parameter to leave room for UPDATES

Advanced Creation Methods

As a Data Engineer, you’ll often create tables based on existing data or logic rather than starting from scratch.

A. CREATE TABLE AS (CTAS)

This creates a table and populates it with the results of a query in one step.

  • Note: This does not copy indexes or constraints from the source.

B. CREATE TABLE ... (LIKE ...)

This is perfect for creating "Staging" tables. It copies the structure of an existing table exactly.


Temporary Tables

If you only need a table for the duration of a specific ETL session or script, use TEMP.

  • These are automatically dropped when the database session ends.

  • They are private to the user who created them.


Column Properties & Identities

PostgreSQL offers specific ways to handle auto-incrementing IDs. While the legacy SERIAL type is common, the modern SQL-standard is IDENTITY.

Feature

Legacy: SERIAL

Modern: GENERATED ALWAYS AS IDENTITY

Standard

Postgres specific

SQL Standard (Portable)

Safety

User can easily override values

Harder to accidentally duplicate IDs

Control

Tied to a sequence object

Integrated directly into the table


Summary Checklist

When writing your CREATE TABLE scripts, verify these four things:

  1. Timezones: Did you use TIMESTAMPTZ for event logs?

  2. Nullability: Are columns that should be required marked NOT NULL?

  3. Primary Key: Does every table have a unique identifier?

  4. Naming: Are you following a consistent naming convention (e.g., snake_case)?


Demonstration: Table Structure

This diagram shows how the CREATE TABLE definition translates into the logical structure of a database entity.

spinner

Partitioned Tables (Declarative Partitioning)

A partitioned table is a Logical Parent. It does not hold data itself; it defines a "template" and a "partition key" (like a date or region). The data actually lives in Partitions (which are just regular tables linked to the parent).

  • When to use: For "Big Data" (hundreds of millions or billions of rows). It is essential for time-series data (logs, transactions).

  • The Power of Pruning: If you query for January 2026, the engine "prunes" (ignores) every other partition, scanning only the relevant file.

  • Data Lifecycle: To delete old data (e.g., data from 2020), you don't run a slow DELETE. You simply DETACH or DROP the entire partition. This is an instantaneous metadata change.

The Syntax Difference

You must use the PARTITION BY clause during creation.

spinner

Last updated