Changefeed Best Practices

On this page

This page describes best practices to consider when starting changefeeds on a CockroachDB cluster. We recommend referring to this information while planning your cluster's changefeeds and following the links in each of the sections for more details on a topic.

Note:

To help in planning your cluster's changefeeds on CockroachDB Cloud clusters, refer to the Understand CockroachDB Cloud Costs page for detail on how CDC is billed monthly based on usage.

Plan the number of watched tables for a single changefeed

When creating a changefeed, it's important to consider the number of changefeeds versus the number of tables to include in a single changefeed:

Changefeeds each have their own memory overhead, so every running changefeed will increase total memory usage.
Creating a single changefeed that will watch hundreds of tables can affect the performance of a changefeed by introducing coupling, where the performance of a target table affects the performance of the changefeed watching it. For example, any schema change on any of the tables will affect the entire changefeed's performance.

To watch multiple tables, we recommend creating a changefeed with a comma-separated list of tables. However, we do not recommend creating a single changefeed for watching hundreds of tables.

Cockroach Labs recommends monitoring your changefeeds to track retryable errors and protected timestamp usage. Refer to the Monitor and Debug Changefeeds page for more information.

Maintain system resources and running changefeeds

When you are running more than 10 changefeeds on a cluster, it is important to monitor the CPU usage. A larger cluster will be able to run more changefeeds concurrently compared to a smaller cluster with more limited resources.

Note:

We recommend limiting the number of changefeeds per cluster to 80.

To maintain a high number of changefeeds in your cluster:

Connect to different nodes to create each changefeed. The node on which you start the changefeed will become the coordinator node for the changefeed job. The coordinator node acts as an administrator: keeping track of all other nodes during job execution and the changefeed work as it completes. As a result, this node will use more resources for the changefeed job. For more detail, refer to How does an Enterprise changefeed work?.
Consider logically grouping the target tables into one changefeed. When a changefeed pauses, it will stop emitting messages for the target tables. Grouping tables of related data into a single changefeed may make sense for your workload. However, we do not recommend watching hundreds of tables in a single changefeed. For more detail on protecting data from garbage collection when a changefeed is paused, refer to Garbage collection and changefeeds.

Monitor changefeeds

We recommend starting the changefeed with the metrics_label option, which allows you to measure metrics per changefeed. Metrics label information is sent with time-series metrics to the Prometheus endpoint.

The key areas to monitor when running changefeeds:

Retryable errors: changefeed.error_retries
Failures: changefeed.failures
CPU usage for more than 10 changefeeds: Overload Dashboard
Protected timestamp and garbage collection:
- jobs.changefeed.protected_age_sec
- jobs.changefeed.currently_paused
- jobs.changefeed.expired_pts_records
- jobs.changefeed.protected_record_count

Manage changefeeds and schema changes

When a schema change is issued that causes a column backfill, it can result in a changefeed emitting duplicate messages for an event. We recommend issuing schema changes outside of explicit transactions. For more details on schema changes and column backfill generally, refer to the Online Schema Changes page.

You can also use the schema_change_events and schema_change_policy options to define a schema change type and an associated policy that will modify how the changefeed behaves under the schema change.

Lock the schema on changefeed watched tables

Use the schema_locked storage parameter to disallow schema changes on a watched table, which allows the changefeed to take a fast path that avoids checking if there are schema changes that could require synchronization between changefeed aggregators. This helps to decrease the latency between a write committing to a table and it emitting to the changefeed's sink. Enabling schema_locked

Enable schema_locked on the watched table with the ALTER TABLE statement:

ALTER TABLE watched_table SET (schema_locked = true);

While schema_locked is enabled on a table, attempted schema changes on the table will be rejected and an error returned. If you need to run a schema change on the locked table, unlock the table with schema_locked = false, complete the schema change, and then lock the table again with schema_locked = true. The changefeed will run as normal while schema_locked = false, but it will not benefit from the performance optimization.

ALTER TABLE watched_table SET (schema_locked = false);

Pricing

Contact us

Sign In

Changefeed Best Practices

Plan the number of watched tables for a single changefeed

Maintain system resources and running changefeeds

Monitor changefeeds

Manage changefeeds and schema changes

Lock the schema on changefeed watched tables

See also

Tell us about your experience

Thank you for your feedback!

Explore More Documentation:

Changefeed Best Practices

Plan the number of watched tables for a single changefeed

Maintain system resources and running changefeeds

Monitor changefeeds

Manage changefeeds and schema changes

Lock the schema on changefeed watched tables

See also

Tell us about your experience

Select the problem area

Thank you for your feedback!

Explore More Documentation: