A quick look at what metadata is, why it’s important, and how you can architect your application to ensure highly available, consistent metadata at scale.
Quick GuideChoosing the right metadata store should be contingent on your system architecture. There is no definitive ‘right choice’. But if you are building your products or services on top of distributed systems then your choices are certainly narrowed.
Recently, the data security company Rubrik, wrote a three-part blog series about how they chose a metadata store for Rubrik CDM. Their journey began with Cassandra and ended with CockroachDB. This blog is a summary of the challenges they ran into with Cassandra, the reasons they chose CockroachDB as their metadata store, and what their CockroachDB use case looks like.
Originally, Rubrik used Cassandra as the metadata store to get their product to MVP. They liked a few different qualities of Cassandra including: higher-order column types like maps/lists, high-performance point queries, easy setup, simple deployment and maintenance. But they quickly ran into several issues:
Workarounds were found to address each of the above, but the operational overhead for maintaining the best practices was too burdensome. So Rubrik looked for an alternative metadata store.
Rubrik had three clear pieces of evaluation criteria for their Cassandra replacement:
Check out the consistency and load testing framework that Rubrik built to stress test database options for their required criteria. Upon successfully passing these stress tests, Rubrik’s team selected CockroachDB for their metadata store, and moved on to evaluating the migration process.
The most compelling piece of Rubrik’s migration from Cassandra to CockroachDB is Rubrik’s implementation of a stateless translator called CQLProxy. This tool translates CQL (Cassandra Query Language) into PostgreSQL (CockroachDB’s chosen SQL dialect).
(Image credit: Vijay Karthik)
CQL schema has features like static columns and higher-order column types (think, map columns) which do not exist in SQL. Rubrik implemented these features in CQLProxy by using extra tables in CockroachDB which made application development much easier.
Rubrik details more of the migration from Cassandra to CockroachDB in Part 2 of their blog series, but it essentially involved two steps:
And with CQLProxy, Rubrik didn’t have to make any changes to their application layers. Otherwise, the complexity of changing application code while also swapping out distributed databases would have been painful.
Generally speaking, the migration to CockroachDB from Cassandra was a win for Rubrik. They had less operational overhead and the new support cross-table transactions (through CQLProxy) simplified their application logic. But they did need to find workaround for a couple challenges in CockroachDB: clock skew and backpressure.
For the issue with clock skew Rubrik built a distributed time service that they plugged into CockroachDB. Some of Rubrik’s physical clusters were prone to clock skew because of the NTP servers misbehaving. Ordinarily, NTP helps correct clocks (it’s one of CockroachDB’s recommendations for avoiding clock skew). But when the NTP server hiccups the clocks get too far out of sync.
Kronos, the custom time service build by Rubrik, has the following properties:
Kronos runs inside CockroachDB on each node. It elects an “Oracle” within the cluster to make the time selection. It’s really a fascinating tool that solves an important problem. Go here to read more about Kronos.
Backpressure was the next challenge Rubrik faced in CockroachDB. When rows were being updated super frequently or garbage collection was lagging behind the prescribed TTL this issue crept up. Rubrik implemented a few changes which helped reduce the number of backpressure errors they encountered:
At CockroachDB we’re grateful to Rubrik for the contributions they’ve made to our public repo. Collaborating with their team over the last few years to solve problems has been an important learning experience and has improved our product. To learn more about Rubrik’s CockroachDB use case you can watch this video:
If you’re interested in learning more about metadata management you can check out our reference architecture and watch this high level overview video:
Metadata management is a critical part of any business application. Let’s take a quick look at what metadata is, why …
Read moreThe gaming industry is all about delivering a great end-user experience which means building relationships with players …
Read moreNubank, a leading Brazilian financial technology company valued at more than $45 billion dollars, needed a scalable SQL …
Read more