Change data capture (CDC) detects row-level data changes in CockroachDB and sends the change as a message to a configurable sink for downstream processing purposes. While CockroachDB is an excellent system of record, it also needs to coexist with other systems.
For example, you might want to:
- Stream messages to Kafka to trigger notifications in an application.
- Keep your data mirrored in full-text indexes, analytics engines, or big data pipelines.
- Export a snaphot of tables to backfill new applications.
- Send updates to data stores for machine learning models.
The main feature of CockroachDB CDC is the changefeed, which targets an allowlist of tables, or "watched rows".
Stream row-level changes with changefeeds
Changefeeds are customizable jobs that track row-level changes and send data in realtime in a preferred format to your specified destination, known as a sink. Every row change will be emitted at least once and the first emit of every event for the same key will be ordered by timestamp.
CockroachDB has two implementations of changefeeds:
Basic changefeeds | Enterprise changefeeds | |
---|---|---|
Use case | Useful for prototyping or quick testing. | Recommended for production use. |
Product availability | All products | CockroachDB Basic, Standard, Advanced, or with an Enterprise license in CockroachDB self-hosted. |
Message delivery | Streams indefinitely until underlying SQL connection is closed. | Maintains connection to configured sink: Kafka, Google Cloud Pub/Sub, Amazon S3, Google Cloud Storage, Azure Storage, HTTP, Webhook. |
SQL statement | Create with EXPERIMENTAL CHANGEFEED FOR |
Create with CREATE CHANGEFEED |
Targets | Watches one or multiple tables in a comma-separated list. | Watches one or multiple tables in a comma-separated list. |
Filter change data | Not supported | Use CDC queries to define the emitted change data. |
Schedule changefeeds | Not supported | Create a scheduled changefeed with CREATE SCHEDULE FOR CHANGEFEED . |
Job execution locality | Not supported | Use execution_locality to determine the node locality for changefeed job execution. |
Message format | Emits every change to a "watched" row as a record to the current SQL session. | Emits every change to a "watched" row as a record in a configurable format: JSON, CSV, Avro, Parquet. |
Management | Create the changefeed and cancel by closing the connection. | Manage changefeed with CREATE , PAUSE , RESUME , ALTER , and CANCEL . |
Monitoring | Not supported | Metrics available to monitor in the DB Console and Prometheus. Job observability with SHOW CHANGEFEED JOBS . |
Get started with changefeeds
To get started with changefeeds in CockroachDB, refer to:
- Create and Configure Changefeeds: Learn about the fundamentals of using SQL statements to create and manage Enterprise and basic changefeeds.
- Changefeed Sinks: The downstream system to which the changefeed emits changes. Learn about the supported sinks and configuration capabilities.
- Changefeed Messages: The change events that emit from the changefeed to your sink. Learn about how messages are ordered at your sink and the options to configure and format messages.
- Changefeed Examples: Step-by-step examples for connecting to each changefeed sink.
Authenticate to your changefeed sink
To send changefeed messages to a sink, it is necessary to provide the CREATE CHANGEFEED
statement with authentication credentials.
The following pages detail the supported authentication:
Sink | Authentication page |
---|---|
Cloud Storage | Refer to Cloud Storage Authentication for detail on setting up:
|
Kafka | Refer to:
|
Webhook | Refer to:
|
Google Cloud Pub/Sub | Refer to:
|
Monitor your changefeed job
It is a best practice to monitor your changefeed jobs for behavior such as failures and retries.
You can use the following tools for monitoring:
- The Changefeed Dashboard on the DB Console
- The
SHOW CHANGEFEED JOBS
statement - Changefeed metrics labels
Refer to the Monitor and Debug Changefeeds page for recommendations on metrics to track.
For detail on how protected timestamps and garbage collection interact with changefeeds, refer to Protect Changefeed Data from Garbage Collection.
Optimize a changefeed for your workload
Filter your change data with CDC queries
Change data capture queries allow you to define and filter the change data emitted to your sink when you create an Enterprise changefeed.
For example, you can use CDC queries to:
- Filter out rows and columns from changefeed messages to decrease the load on your downstream sink.
- Modify data before it emits to reduce the time and operational burden of filtering or transforming data downstream.
- Stabilize or customize the schema of your changefeed messages for increased compatibility with external systems.
Refer to the Change Data Capture Queries page for more example use cases.
Use changefeeds to export a table
Changefeeds can export a single table scan to your sink. The benefits of using changefeeds for exports include: job management, observability, and sink configurability. You can also schedule changefeeds to export tables, which may be useful to avoid table scans during peak periods.
For examples and more detail, refer to:
Determine the nodes running a changefeed by locality
CockroachDB supports an option to set locality filter requirements that nodes must meet in order to take part in a changefeed job. This is helpful in multi-region clusters to ensure the nodes that are physically closest to the sink emit changefeed messages. For syntax and further technical detail, refer to Run a changefeed job by locality.