Stardog Cluster Recovery

Paul Marshall

Feb 15, 2019, 3 minute read

Failed nodes can rejoin Stardog Cluster quickly, even when the cluster is under constant write pressure. In this post we show a brief screencast demonstrating this important feature of Stardog Cluster running in Kubernetes.

Failure: Not If, But When

Most users architect their cluster deployments to be as robust as possible. It’s important to use quality hardware and reliable network connections. Even in the best cases components in a distributed system will fail, regardless of how much time or money you spent trying to avoid it. When those components fail, one of the first questions is if the application can recover on its own. And the second: how quickly?

Turning to platforms that advertise self-healing capabilities, such as Kubernetes (k8s), may lure you into thinking that the platform will handle application recovery via its built-in mechanisms (e.g., restarting applications that fail health checks). However, the ability of an application to recover in k8s is only as good as the underlying recovery mechanisms in the application itself. If your HA cluster can’t recover seamlessly on its own, especially when it’s under load, environments like k8s will only make matters worse, not better.

Automatic Recovery of Stardog Cluster

Stardog Cluster maintains strong consistency, which means that the cluster guarantees that the data is the same on all of the nodes at all times. That is, all nodes maintain a full copy of the data and writes must be applied to all nodes before the write is acknowledged to the client. Reads can be served from any node in the cluster. If a node fails for any reason (e.g., the underlying host crashes or a transaction fails to commit on one node), it is dropped from the cluster and must rejoin, syncing any missed transactions or possibly syncing the entire database(s) if needed.

It’s easy for nodes to join clusters that do not experience frequent updates because there is ample idle time for a node to sync completely and join.

It is significantly more difficult for new nodes to join, or failed nodes to rejoin, a cluster that is continuously updated. Due to Stardog’s strong consistency model, the updates must be blocked, at least briefly, for the cluster to ensure that the joining node is fully synchronized. The default behavior in Stardog Cluster is for a node to try to join three times without blocking writes (anticipating that it may hit a quiet window and be able to join) before blocking writes to force its way back into the cluster.

In the screencast below I demonstrate the rejoin capabilities of Stardog Cluster by manually deleting a Stardog node in a 3 node cluster, running in Kubernetes (k8s), while the cluster is under constant read and write pressure.

The terminal session on the left reports the following every minute: the current number of nodes in the cluster, the total number of writes, the number of writes that fail and must be retried by client-side application logic (due to the rejoining node briefly blocking writes), and the number of reads that fail. The terminal session on the right shows the state of the Stardog and ZooKeeper nodes in k8s as well me deleting a Stardog pod once the read and write workloads have begun. The node is able to rejoin within minutes while only blocking a few write transactions.

The ability of Stardog Cluster to recover seamlessly under pressure makes it a great choice for any production deployment that requires high-availability. This capability is especially critical if you’re going to run your production deployment of Stardog in a cloud or k8s-based environment where you’ll not only encounter hardware failures, but the cloud provider or k8s admin may terminate your VMs or containers for any number of reasons, including host maintenance or security updates.