Set Up Logical Data Replication

On this page Carat arrow pointing down
Note:

This feature is in preview and subject to change. To share feedback and/or issues, contact Support.

Logical data replication is only supported in CockroachDB self-hosted clusters.

In this tutorial, you will set up logical data replication (LDR) streaming data from a source table to a destination table between two CockroachDB clusters. Both clusters are active and can serve traffic. You can apply the outlined steps to set up one of the following:

  • Unidirectional LDR from a source table to a destination table (cluster A to cluster B) in one LDR job.
  • Bidirectional LDR for the same table from cluster A to cluster B and from cluster B to cluster A. In a bidirectional setup, each cluster operates as both a source and a destination in separate LDR jobs.

New in v25.1: Create the new table on the destination cluster automatically and conduct a fast, offline initial scan with the CREATE LOGICALLY REPLICATED syntax. CREATE LOGICALLY REPLICATED accepts unidirectional or bidirectional on as an option in order to create one of the setups automatically. Step 3 outlines when to use the CREATE LOGICALLY REPLICATED or the CREATE LOGICAL REPLICATION STREAM syntax to start LDR.

In the following diagram, LDR stream 1 creates a unidirectional LDR setup, introducing LDR stream 2 extends the setup to bidirectional.

Diagram showing bidirectional LDR from cluster A to B and back again from cluster B to A.

For more details on use cases, refer to the Logical Data Replication Overview.

Tutorial overview

If you're setting up bidirectional LDR, both clusters will act as a source and a destination in the respective LDR jobs. The high-level steps for setting up bidirectional or unidirectional LDR:

  1. Prepare the clusters with the required settings, users, and privileges according to the LDR setup.
  2. Set up external connection(s) on the destination to hold the connection URI for the source.
  3. Start LDR from the destination cluster with your required modes and syntax.
  4. Check the status of the LDR job in the DB Console.

Before you begin

You'll need:

  • Two separate v25.1 CockroachDB self-hosted clusters with connectivity between every node in both clusters. That is, all nodes in cluster A must be able to contact each node in cluster B and vice versa. The SQL advertised address should be the cluster node advertise address so that the LDR job can plan node-to-node connections between clusters for maximum performance.
  • LDR replicates at the table level, which means clusters can contain other tables that are not part of the LDR job. (For CREATE LOGICAL REPLICATION STREAM only): If both clusters are empty, create the tables that you need to replicate with identical schema definitions (excluding indexes) on both clusters. If one cluster already has an existing table that you'll replicate, ensure the other cluster's table definition matches. For more details on the supported schemas, refer to Schema Validation.

Schema validation

Before you start LDR, you must ensure that all column names, types, constraints, and unique indexes on the destination table match with the source table.

You cannot use LDR on a table with a schema that contains:

Additionally, for the CREATE LOGICALLY REPLCATED syntax, you cannot use LDR on a table with a schema that contains:

For more details, refer to the LDR Known limitations.

When you run LDR in immediate mode, you cannot replicate a table with foreign key constraints. In validated mode, foreign key constraints must match. All constraints are enforced at the time of SQL/application write.

Step 1. Prepare the cluster

In this step you'll prepare the required settings and privileges for LDR.

Note:

If you are setting up bidirectional LDR, you must run this step on both clusters.

  1. Enter the SQL shell for both clusters in separate terminal windows:

    icon/buttons/copy
    cockroach sql --url "postgresql://root@{node IP or hostname}:26257?sslmode=verify-full" --certs-dir=certs
    
  2. Enable the kv.rangefeed.enabled cluster setting on the source cluster:

    icon/buttons/copy
    SET CLUSTER SETTING kv.rangefeed.enabled = true;
    
  3. On the destination, create a user with the REPLICATION system privilege who will start the LDR job:

    icon/buttons/copy
    CREATE USER {your username} WITH PASSWORD '{your password}';
    
    icon/buttons/copy
    GRANT SYSTEM REPLICATION TO {your username};
    

    To change the password later, refer to ALTER USER.

Step 2. Connect from the destination to the source

In this step, you'll set up external connection(s) to store the connection string for one or both clusters. Depending on how you manage certificates, you must ensure that all nodes between the clusters have access to the certificate of the other cluster.

You can use the cockroach encode-uri command to generate a connection string containing a cluster's certificate.

  1. On the source cluster in a new terminal window, generate a connection string, by passing the replication user, node IP, and port, along with the directory to the source cluster's CA certificate:

    icon/buttons/copy
    cockroach encode-uri {user}:{password}@{node IP}:26257 --ca-cert {path to CA certificate} --inline
    

    The connection string output contains the source cluster's certificate:

    {user}:{password}@{node IP}:26257?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded certificate}-----END+CERTIFICATE-----%0A
    
  2. In the SQL shell on the destination cluster, create an external connection using the source cluster's connection string. Prefix the postgresql:// scheme to the connection string and replace {source} with your external connection name:

    icon/buttons/copy
    CREATE EXTERNAL CONNECTION {source} AS 'postgresql://{user}:{password}@{node IP}:26257?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded certificate}-----END+CERTIFICATE-----%0A';
    

(Optional) Bidirectional: Create the connection for LDR stream 2

(Optional) For bidirectional LDR, you'll need to repeat creating the certificate output and the external connection for the opposite cluster. Both clusters will act as the source and destination. At this point, you've created an external connection for LDR stream 1, so cluster A (source) to B (destination). Now, create the same for LDR stream 2 cluster B (source) to cluster A (destination).

  1. On cluster B, run:

    icon/buttons/copy
    cockroach encode-uri {user}:{password}@{node IP}:26257 --ca-cert {path to CA certificate} --inline
    

    The connection string output contains the source cluster's certificate:

    {user}:{password}@{node IP}:26257?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded certificate}-----END+CERTIFICATE-----%0A
    
  2. On cluster A, create an external connection using cluster B's connection string (source in LDR stream 2). Prefix the postgresql:// scheme to the connection string and replace {source} with your external connection name:

    icon/buttons/copy
    CREATE EXTERNAL CONNECTION {source} AS 'postgresql://{user}:{password}@{node IP}:26257?options=-ccluster%3Dsystem&sslinline=true&sslmode=verify-full&sslrootcert=-----BEGIN+CERTIFICATE-----{encoded certificate}-----END+CERTIFICATE-----%0A';
    

Step 3. Start LDR

In this step, you'll start the LDR stream(s) from the destination cluster. You can replicate one or multiple tables in a single LDR job. You cannot replicate system tables in LDR, which means that you must manually apply configurations and cluster settings, such as row-level TTL and user permissions on the destination cluster.

Modes determine how LDR replicates the data to the destination cluster. There are two modes:

  • immediate (default): Attempts to replicate the changed row directly into the destination table, without re-running constraint validations. It does not support writing into tables with foreign key constraints.
  • validated: Attempts to apply the write in a similar way to a user-run query, which would re-run all constraint validations relevant to the destination table(s). If the change violates foreign key dependencies, unique constraints, or other constraints, the row will be put in the dead letter queue (DLQ) instead. Like the SQL layer, validated mode does not recognize deletion tombstones. As a result, an update to the same key from cluster A will successfully apply on cluster B, even if that key was deleted on cluster B before the LDR job streamed the cluster A update to the key.

Syntax

LDR streams can be started using one of the following SQL statements, depending on your requirements:

  • New in v25.1: CREATE LOGICALLY REPLICATED: Creates the new table on the destination cluster automatically, and conducts a fast, offline initial scan. CREATE LOGICALLY REPLICATED accepts unidirectional or bidirectional on as an option in order to create either one of the setups automatically. The table cannot contain a user-defined types or foreign key dependencies. Follow these steps for setup instructions.
  • CREATE LOGICAL REPLICATION STREAM: Starts the LDR stream after you've created the matching table on the destination cluster. If the table contains user-defined types or foreign key dependencies, you must use this syntax. Allows for manual creation of unidirectional or bidirectional LDR. Follow these steps for setup instructions.

Also, for both SQL statements, note:

  • It is necessary to use the fully qualified table name for the source table and destination table in the statement.
  • There are some tradeoffs between enabling one table per LDR job versus multiple tables in one LDR job. Multiple tables in one LDR job can be easier to operate. For example, if you pause and resume the single job, LDR will stop and resume for all the tables. However, the most granular level observability will be at the job level. One table in one LDR job will allow for table-level observability.

CREATE LOGICALLY REPLICATED

Use CREATE LOGICALLY REPLICATED to create either a unidirectional or bidirectional LDR stream automatically:

  • Unidirectional LDR: run the following from the destination cluster:

    icon/buttons/copy
    CREATE LOGICALLY REPLICATED TABLE {database.public.destination_table_name} FROM TABLE {database.public.source_table_name} ON 'external://source' WITH unidirectional;
    
  • Bidirectional LDR: This statement will first create the LDR jobs for the first stream. You must run it from the destination cluster that does not contain the table. Once the offline initial scan completes, the reverse stream will be initialized so that the original destination cluster can send changes to the original source.

    Run the following from the destination cluster (i.e, the cluster that does not have the table currently):

    icon/buttons/copy
    CREATE LOGICALLY REPLICATED TABLE {database.public.destination_table_name} FROM TABLE {database.public.source_table_name} ON 'external://source' WITH bidirectional ON 'external://destination';
    

You can include multiple tables in the LDR stream for unidirectional or bidirectional setups. Ensure that the table name in the source table list and destination table list are in the same order so that the tables correctly map between the source and destination for replication:

icon/buttons/copy
CREATE LOGICALLY REPLICATED TABLE ({database.public.destination_table_name_1}, {database.public.destination_table_name_2}) FROM TABLE ({database.public.source_table_name_1}, {database.public.source_table_name_2}) ON 'external://source' WITH bidirectional ON 'external://destination', label=track_job;

With the LDR streams created, move to Step 4 to manage and monitor the jobs.

CREATE LOGICAL REPLICATION STREAM

Ensure you've created the table on the destination cluster with a matching schema definition to the source cluster table. From the destination cluster, start LDR. Use the fully qualified table name for the source and destination tables:

icon/buttons/copy
CREATE LOGICAL REPLICATION STREAM FROM TABLE {database.public.source_table_name} ON 'external://{source_external_connection}' INTO TABLE {database.public.destination_table_name};

You can change the default mode using the WITH mode = validated syntax.

If you would like to add multiple tables to the LDR job, ensure that the table name in the source table list and destination table list are in the same order:

icon/buttons/copy
CREATE LOGICAL REPLICATION STREAM FROM TABLES ({database.public.source_table_name_1},{database.public.source_table_name_2},...)  ON 'external://{source_external_connection}' INTO TABLES ({database.public.destination_table_name_1},{database.public.destination_table_name_2},...);

(Optional) At this point, you've set up one LDR stream from cluster A as the source to cluster B as the destination. To set up LDR streaming in the opposite direction using CREATE LOGICAL REPLICATION STREAM, run the statement again but cluster B will now be the source, and cluster A will be the destination.

Step 4. Manage and monitor the LDR jobs

Once LDR has started, an LDR job will run on the destination cluster. You can pause, resume, or cancel the LDR job with the job ID. Use SHOW LOGICAL REPLICATION JOBS to display the LDR job IDs:

icon/buttons/copy
SHOW LOGICAL REPLICATION JOBS;
        job_id        | status  |          tables           | replicated_time
----------------------+---------+---------------------------+------------------
1012877040439033857   | running | {database.public.table}   | NULL
(1 row)

If you're setting up bidirectional LDR, both clusters will have a history retention job and an LDR job running.

DB Console

You'll access the DB Console and monitor the status and metrics for the created LDR jobs. Depending on which cluster you would like to view, follow the instructions for either the source or destination.

Tip:

You can use the DB Console, the SQL shell, Metrics Export with Prometheus and Datadog, and labels with some LDR metrics to monitor the job.

For a full reference on monitoring LDR, refer to Logical Data Replication Monitoring.

  1. Access the DB Console at http://{node IP or hostname}:8080 and enter your user's credentials.
  2. On the source cluster, navigate to the Jobs page to view a list of all jobs. Use the job Type dropdown and select Replication Producer. This will display the history retention job. This will run while the LDR job is active to protect changes to the table from garbage collection until they have been replicated to the destination cluster.
  3. On the destination cluster, use the job Type dropdown and select Logical Replication Ingestion. This page will display the logical replication stream job. There will be a progress bar in the Status column when LDR is replicating a table with existing data. This progress bar shows the status of the initial scan, which backfills the destination table with the existing data.
  4. On the destination cluster, click on Metrics in the left-hand navigation menu. Use the Dashboard dropdown to select Logical Data Replication. This page shows graphs for monitoring LDR.

What's next


Yes No
On this page

Yes No