CAST


In SQL, the CAST statement is your primary tool for Data Type Conversion.

As a Data Engineer, you'll use CAST constantly. Raw data often arrives in "dirty" formats (like everything being a string/text), and you must convert it into the proper format (like integers or dates) before you can perform math or logic on it.


Purpose of CAST

The CAST function converts an expression of one data type into another. This is essential for:

  • Performing Calculations: You can't multiply a "Price" if it's stored as TEXT.

  • Joining Tables: You cannot join two tables if the ID is a STRING in one and an INTEGER in the other.

  • Data Cleaning: Ensuring dates are actual DATE objects so you can filter by month or year.


Standard SQL Syntax

The syntax is consistent across almost all SQL flavors (Postgres, BigQuery, Snowflake, SQL Server):

CAST(expression AS target_data_type)

Common Examples:

-- Convert a string to a decimal for math
SELECT CAST('15.50' AS DECIMAL(10,2)) * 1.1 AS price_with_tax;

-- Convert a timestamp to a simple date
SELECT CAST(tpep_pickup_datetime AS DATE) AS pickup_day;

-- Convert an integer to text (useful for concatenation)
SELECT 'Order ID: ' || CAST(order_id AS TEXT);

The Shorthand Notation ::

Many modern data warehouses (Postgres, Snowflake, Redshift) support a shorthand using double colons. This makes your code much cleaner, though it is not part of the ANSI standard SQL.

Standard SQL

Shorthand Equivalent

CAST(amount AS INT)

amount::INT

CAST(created_at AS DATE)

created_at::DATE


Usage in CTEs (The "Staging" Pattern)

In a dbt or standard DE pipeline, we typically use CAST in the very first CTE (often called the "staging" or "import" layer) to ensure the rest of the pipeline has "clean" types.


What happens if a CAST fails?

If you try to CAST('abc' AS INT), the query will usually crash with an error.

  • Pro Tip: Most modern warehouses offer a "safe" version, like SAFE_CAST (BigQuery) or TRY_CAST (Snowflake/SQL Server). These return NULL instead of failing the whole job if the data is messy.


Last updated