We live in an increasingly interconnected world, with many organizations operating across countries or even across continents. To serve their global customer base, organizations are moving to replace their legacy database management systems (DBMSs) with cloud-based DBMSs capable of scaling on-line transaction processing (OLTP) workloads to millions of users.
CockroachDB is a distributed SQL DBMS that was built from the ground up to support these global OLTP workloads while maintaining high availability and strong consistency. Just like its namesake, CockroachDB is resilient to disasters through replication and automatic recovery mechanisms.
This paper presents the design of CockroachDB and its novel transaction model that supports consistent geo-distributed transactions without the use of specialized hardware. We describe how CockroachDB replicates and distributes data to achieve fault tolerance and high performance, as well as how its distributed SQL layer automatically scales with the size of the database cluster while providing the standard SQL interface that users expect. Finally, we present a comprehensive performance evaluation and share a couple of case studies of CockroachDB users. We conclude by describing lessons learned while building CockroachDB over the last five years.