Snowflake CDC Explained: Streams, Use Cases, and Troubleshooting Tips

Snowflake CDC with Streams: Setup, Use Cases & Tips

Data teams are always under pressure to deliver fresher insights without increasing complexity or cost. As organizations rely more heavily on cloud analytics platforms like Snowflake, traditional batch-based data pipelines often fall short, introducing latency, duplication, and unnecessary processing. 

Change data capture (CDC) solves this by giving teams a way to track and process only what’s changed: inserts, updates, and deletes, for example. Instead of reloading full tables, CDC applies incremental updates, keeping analytics environments accurate, efficient, and up to date. 

Inside Snowflake, CDC workflows typically rely on Streams and Tasks. Streams track row-level changes to a table. Tasks automate how those changes are applied to target tables. Together, they form the backbone of incremental data pipelines in Snowflake.

In this blog, we’ll break down how Snowflake CDC works, the role of Streams, common use cases, setup best practices, and troubleshooting tips to help you build reliable, scalable pipelines that support near real-time analytics and modern data architectures.

What is CDC?

Change data capture, or CDC, is a data integration technique used to identify and track changes made to data in a source system over time. Instead of reloading entire data sets whenever an update occurs, CDC captures only what has changed, such as new records, updates to existing rows, or deletions. This approach makes data pipelines more efficient and significantly reduces processing time and system load. Because CDC enables real-time data integrations, the data community considers CDC a gold standard for practical data synchronization. 

CDC plays a critical role in keeping downstream systems up to date. By continuously monitoring data changes, organizations can deliver real-time data to analytics platforms, applications, and reporting environments. This ensures that insights are based on the most current information rather than delayed batch updates.

For analytics and reporting teams, CDC enables faster and more accurate insights. When integrated with BI tools, change data capture allows dashboards and reports to reflect the latest business activity, supporting timely decision-making. Sales metrics, operational performance, and customer behavior can be analyzed as they happen, rather than hours or days later.

Overall, CDC is a foundational capability for modern data architectures. It supports agile analytics, reduces data latency, and enables organizations to respond quickly to change. As data volumes grow and the demand for timely insight increases, change data capture has become an essential technique for maintaining data freshness and reliability.

What is Snowflake?

Snowflake is a cloud-native data warehouse designed to help organizations store, manage, and analyze large volumes of data with speed and flexibility. Unlike traditional on-premises systems, Snowflake was built specifically for the cloud, allowing it to scale compute and storage independently. 

Snowflake provides a central place for data storage and analytics, supporting structured and semi-structured data. Organizations can load data from a wide range of sources and make it available users through a single, governed environment. 

Snowflake’s separation of storage and compute enables multiple teams to run queries simultaneously without competing for resources. Snowflake also offers built-in features for data sharing, access control, and encryption, helping organizations manage sensitive information while maintaining compliance.

Because Snowflake operates across major cloud providers, it gives businesses the flexibility to choose the environment that best fits their needs. With a 4.6/5 rating on G2, it’s a popular and well-loved platform. 

What is Snowflake CDC?

Snowflake CDC refers to the process of capturing and processing data changes from source systems and applying those changes efficiently within Snowflake. Rather than relying on full data reloads, Snowflake CDC focuses on identifying inserts, updates, and deletes and reflecting those changes in Snowflake tables as they occur. This approach helps organizations keep analytics environments accurate, current, and performant.

Snowflake itself doesn’t include native CDC extraction from source systems. Instead, it serves as the destination where captured changes are stored, processed, and analyzed. CDC data is typically delivered into Snowflake through integration tools, event-based pipelines, or log-based capture mechanisms that monitor source databases. Once the changes arrive, Snowflake’s scalable architecture makes it easy to process, merge, and query updated records without disrupting ongoing workloads.

A key advantage of Snowflake CDC is its ability to support modern, near-real-time architectures. By pairing Snowflake with data streaming technologies, organizations can continuously ingest change events and make fresh data available for analytics, dashboards, and downstream applications. This reduces latency and ensures business users are working with timely information.

Snowflake CDC is especially valuable for use cases that depend on current operational data, such as monitoring transactions, tracking customer behavior, or powering real-time analytics. By minimizing data movement and processing overhead, Snowflake CDC enables more efficient pipelines while maintaining data accuracy and consistency across the organization.

What is a Snowflake Stream?

A Snowflake Stream is a built-in object that tracks changes made to a table or view over time. It records inserts, updates, and deletes as they occur, allowing downstream processes to consume only the data that has changed since the last time it was queried. In the context of change data capture, Snowflake Streams provide a native way to identify and process incremental changes without reprocessing entire data sets.

Snowflake Streams are commonly used to support CDC workflows, incremental transformations, and near-real-time analytics. They work especially well with tasks and scheduled jobs, enabling teams to build efficient pipelines that continuously apply changes to target tables, data marts, or analytics layers. Because Streams operate within Snowflake, they avoid the overhead of external change tracking and integrate cleanly with existing SQL-based workflows.

Streams maintain an offset that advances as data is consumed, ensuring changes are processed once and in the correct order. This makes them reliable for building repeatable, auditable data pipelines that stay in sync with source tables.

Snowflake supports several types of Streams, each designed for different use cases:

  • Standard table Streams, which track row-level changes on a single table
  • Append-only Streams, which track only inserted rows and are useful for immutable or event-style data
  • View-based Streams, which track changes to the underlying tables that affect a view’s results

Together, these Stream types give teams flexibility in how they capture and process data changes, making Snowflake Streams a core building block for modern CDC and incremental data pipelines.

Use cases for Snowflake CDC

Snowflake CDC enables organizations to move from slow, batch-based data pipelines to more responsive and efficient data architectures. By capturing and applying only what changes, teams can support a wide range of operational and analytical use cases without unnecessary data movement or processing overhead.

Near-real-time analytics and dashboards

Snowflake CDC is commonly used to keep analytics environments continuously updated. By applying incremental changes as they occur, organizations can support timely data reporting for metrics such as sales performance, customer activity, or operational KPIs. This allows business users to work with fresher data instead of waiting for overnight batch jobs.

Incremental ETL pipelines

CDC plays a critical role in modern ETL workflows by eliminating the need for full table reloads. Instead of reprocessing entire data sets, pipelines can apply only inserts, updates, and deletes. This approach reduces compute costs, improves pipeline reliability, and accelerates downstream analytics.

Continuous data transformation

With CDC feeding Snowflake Streams and Tasks, teams can perform ongoing data transformation as changes arrive. This supports use cases such as maintaining dimensional models, updating aggregates, or syncing curated analytics tables. Incremental processing makes transformations faster and easier to scale.

Operational system synchronization

Snowflake CDC is often used to keep multiple systems aligned through cloud integration. Changes from transactional databases can be reflected in Snowflake and then propagated to other platforms, ensuring consistent data across analytics, applications, and downstream services.

Advanced analytics and modeling

CDC supports ETL data transformation pipelines that feed machine learning models, forecasting tools, and advanced analytics. By keeping training and inference data up to date, organizations can improve model accuracy and responsiveness without rebuilding data sets from scratch.

How to set up Snowflake change data capture with Streams

Below is a practical, step-by-step setup you can use to implement CDC inside Snowflake using Streams (to track changes) and Tasks (to process them on a schedule). The examples use Snowflake SQL.

Step 1: Create a source table (the table you want to track)

CREATE OR REPLACE TABLE raw_orders (
  order_id      NUMBER,
  customer_id   NUMBER,
  status        STRING,
  amount        NUMBER(10,2),
  updated_at    TIMESTAMP_NTZ
);

Step 2: Create a target table (the curated table you keep in sync)

CREATE OR REPLACE TABLE curated_orders (
  order_id      NUMBER PRIMARY KEY,
  customer_id   NUMBER,
  status        STRING,
  amount        NUMBER(10,2),
  updated_at    TIMESTAMP_NTZ
);

Step 3: Create a Stream on the source table

This Stream will track inserts, updates, and deletes on raw_orders.

CREATE OR REPLACE STREAM raw_orders_cdc_stream
ON TABLE raw_orders;

Snowflake Streams expose metadata columns you’ll use during processing:

  • METADATA$ACTION (INSERT, DELETE)
  • METADATA$ISUPDATE (TRUE/FALSE)
  • METADATA$ROW_ID (internal row tracking)

Step 4: Apply changes to the target using a MERGE

This pattern is common: treat update events as INSERT rows in the Stream, and process deletes separately.

MERGE INTO curated_orders t
USING (
  SELECT
    order_id,
    customer_id,
    status,
    amount,
    updated_at,
    METADATA$ACTION     AS action,
    METADATA$ISUPDATE   AS is_update
  FROM raw_orders_cdc_stream
) s
ON t.order_id = s.order_id

-- Updates come through as INSERT rows with is_update = TRUE
WHEN MATCHED AND s.action = 'INSERT' AND s.is_update = TRUE THEN
  UPDATE SET
    customer_id = s.customer_id,
    status      = s.status,
    amount      = s.amount,
    updated_at  = s.updated_at

-- True inserts
WHEN NOT MATCHED AND s.action = 'INSERT' THEN
  INSERT (order_id, customer_id, status, amount, updated_at)
  VALUES (s.order_id, s.customer_id, s.status, s.amount, s.updated_at);

Step 5: Process deletes from the Stream

Deletes arrive as METADATA$ACTION = 'DELETE'

DELETE FROM curated_orders t
USING (
  SELECT order_id
  FROM raw_orders_cdc_stream
  WHERE METADATA$ACTION = 'DELETE'
) d
WHERE t.order_id = d.order_id;

Step 6: Automate it with a Task

Tasks let you run the CDC apply logic repeatedly. First, ensure you have a warehouse:

CREATE OR REPLACE WAREHOUSE cdc_wh
  WAREHOUSE_SIZE = 'XSMALL'
  AUTO_SUSPEND = 60
  AUTO_RESUME = TRUE;

Now create a Task that runs every minute:

CREATE OR REPLACE TASK apply_raw_orders_cdc
  WAREHOUSE = cdc_wh
  SCHEDULE = '1 MINUTE'
AS
BEGIN
  MERGE INTO curated_orders t
  USING (
    SELECT
      order_id, customer_id, status, amount, updated_at,
      METADATA$ACTION AS action,
      METADATA$ISUPDATE AS is_update
    FROM raw_orders_cdc_stream
  ) s
  ON t.order_id = s.order_id
  WHEN MATCHED AND s.action = 'INSERT' AND s.is_update = TRUE THEN
    UPDATE SET
      customer_id = s.customer_id,
      status      = s.status,
      amount      = s.amount,
      updated_at  = s.updated_at
  WHEN NOT MATCHED AND s.action = 'INSERT' THEN
    INSERT (order_id, customer_id, status, amount, updated_at)
    VALUES (s.order_id, s.customer_id, s.status, s.amount, s.updated_at);

  DELETE FROM curated_orders t
  USING (
    SELECT order_id
    FROM raw_orders_cdc_stream
    WHERE METADATA$ACTION = 'DELETE'
  ) d
  WHERE t.order_id = d.order_id;
END;

Start the Task:

ALTER TASK apply_raw_orders_cdc RESUME;

Step 7: Verify the Stream and results

See what’s currently pending in the Stream:

SELECT * FROM raw_orders_cdc_stream;

Check the target table:

SELECT * FROM curated_orders ORDER BY updated_at DESC;

Common challenges with Snowflake CDC and how to troubleshoot them

While Snowflake CDC using Streams is powerful, teams often encounter a few recurring challenges when implementing and maintaining these pipelines. Understanding where issues commonly arise makes it easier to troubleshoot problems and build more reliable change processing workflows.

Stream lag or missing changes

One common issue occurs when a Stream appears empty or stops returning new rows. This usually happens because the Stream has already been consumed or because downstream Tasks aren’t running as expected. To troubleshoot, verify that your Task is active and scheduled correctly, and confirm that no other process is consuming the same Stream. Querying the Stream metadata and reviewing Task history can help identify whether changes are being processed or skipped.

Unexpected duplicate or missing records

Duplicate records can appear if merge logic doesn’t properly handle update events, which are represented as delete and insert pairs in Streams. Missing records can occur if merge conditions are too restrictive. Reviewing how METADATA$ACTION and METADATA$ISUPDATE are handled in your SQL logic is critical. Ensuring idempotent merge statements helps prevent both duplication and data loss.

Performance and compute costs

CDC pipelines that run too frequently or process large backlogs can drive up compute usage. If performance degrades, check warehouse sizing and Task frequency. Incremental processing, efficient filters, and right-sizing warehouses can significantly reduce costs while maintaining timely updates.

Stream retention and data loss

Streams rely on Snowflake’s underlying table data retention. If a Stream isn’t consumed within the retention window, changes can be lost. Monitoring Stream age and ensuring tasks run consistently helps avoid gaps in change capture.

Debugging Task failures

Task failures are often caused by SQL errors, permission issues, or warehouse availability. Reviewing Task history and error messages in Snowflake’s UI is the fastest way to diagnose failures. Running Task SQL manually can also help isolate logic errors.

Turning Snowflake CDC into business impact

Snowflake CDC is a powerful approach for keeping your analytics data accurate, timely, and efficient. But its true value is realized when those changes are easy to consume and act on. Domo helps bridge that gap by connecting directly to Snowflake, making change-driven pipelines simple and turning CDC data into insights that are clear and ready for business. 

With Domo’s low-code data integration, automated transformations, and real-time dashboards, your teams can confidently make decisions based on continuous data changes, without added complexity. 

Ready to get the most out of Snowflake CDC? Explore how Domo can help or watch a demo to see it in action.

Table of contents
Try Domo for yourself.
Try free
No items found.
Explore all
No items found.
Data Integration