In this post we discuss best practices for running Stardog Cluster in the cloud.
Cloud providers offer many choices when selecting infrastructure for Stardog Cluster. It provides a highly available (HA) deployment to ensure your production environment stays running even in the face of infrastructure outages. And with the recent release of Stardog 5.3, which includes up to 5x cluster performance improvements for some use cases, we understand if you’re itching to sink your teeth into Stardog Cluster.
While cloud providers, such as Amazon Web Services (AWS) and Microsoft Azure, make it incredibly easy to stand up infrastructure, it’s important to remember that an application is only as reliable as the hardware it runs on. Much of the “hardware” is virtual and shared by customers so providers expose a considerable number of knobs you can use to tune your deployment. The default values may seem convenient at first, but determining if those values are ideal for your workload is complicated. It can be difficult to know which knobs to turn in order to optimize the performance and reliability of Stardog Cluster for your workload. Different infrastructure choices and knobs can also hit your bank account in unexpected and unpleasant ways.
Before we go further, here’s a brief refresher on Stardog Cluster’s architecture.
Stardog Cluster contains a coordinator and one or more participants. All of the Stardog nodes connect to a ZooKeeper ensemble, which consists of an odd number of servers, typically 3 or more.
Stardog Cluster guarantees that all members of the cluster are consistent. Any node that fails an operation is expelled. An expelled node must sync with another node before it can rejoin.
The coordinator is responsible for orchestrating transactions to maintain consistency in the cluster; however, any member of the cluster can handle a client request. If the coordinator fails, the transaction is aborted and a new coordinator is elected.
A load balancer is used to route requests to participating Stardog nodes. The load balancer checks the health of each Stardog node and avoids sending requests to any node that’s unhealthy.
Because of this strong consistency model there are a few items to be aware of when considering your use case:
Many small transactions are slower than fewer large transactions due to the overhead required to commit each transaction at every node.
If there are constant writes a joining node must either wait until the updates subside or obtain a lock that temporarily blocks the writes until the node is synchronized and can join. By default, joining nodes sync as much as possible before they obtain the lock; however, if writes occur too often the joining node may never catch up. Thus, if a node fails to join after three attempts it will forcibly obtain the lock and sync, blocking any writes until it joins.
ZooKeeper ensures the cluster nodes are synchronized. While ZooKeeper is thoroughly tested and impressively robust, as with any distributed system, the more components there are, the more places something can go wrong. And if something can go wrong, something will (eventually) go wrong.
For the remainder of this post we focus cloud-specific discussion on AWS, the largest and most popular infrastructure cloud provider; however, the majority of our advice translates to any cloud provider.
While you can manually setup Stardog Cluster on AWS by following our docs, we recommend Stardog Graviton, which makes it easy to configure everything you need on AWS with a few commands. This post covers what you need to know to get started. Graviton takes care of deploying all of the instances for Stardog and ZooKeeper and configuring them correctly. It also sets up an elastic load balancer (ELB) and configures the health check.
Once you’re able to spin up Stardog clusters on AWS it’s time for the real fun to begin. In an ideal world computers wouldn’t crash and no node would ever go down, but crashes happen so it’s best to plan ahead. And the best approach to handle failures depends on your use case.
Stardog Cluster is designed to work well for many different use cases. But given the constraints imposed by strong consistency, your configuration may deviate from the defaults. There is no one-size-fits-all solution. It’s important that you both tune the deployment for your use case and also run tests to verify the deployment meets your performance and reliability requirements.
Here are a few examples to highlight how you may need to adjust your deployment.
Load-once, read-many: If the bulk of your data is loaded into a database at startup (with perhaps small updates after startup), and the majority of requests are read queries, Stardog Cluster will largely scale horizontally. Each Stardog node in the cluster can mount a volume created from the snapshot, bulk load the data at startup, and since any node can independently respond to a read request the load balancer can distribute requests round-robin. Joining nodes aren’t blocked by read requests so nodes will generally be able to join on their first attempt.
Frequent writes, followed by periods of quiescence: On the other hand, if your data is written to Stardog throughout the day in frequent transactions but not at night, you should adjust accordingly. If it’s important to your use case that a joining node not block writes, you can configure Stardog to never forcibly obtain the join lock. A real consideration is weighing the risk of losing a node: E.g., are you operating your cluster in an unreliable environment? How many nodes you can afford to lose and for how long? If you deploy a three-node cluster but it’s too risky to operate your production cluster with only two nodes for HA, then it may make sense to deploy a larger cluster so you can afford to lose more nodes during write-heavy times and wait for nodes to rejoin once writes subside.
Continuous small writes: If your cluster rarely experiences quiet time with respect to writes and you want nodes to rejoin as quickly as possible, then you can configure a joining node to obtain the lock on the second attempt. In this case the joining node will block the writes; but, since the node will sync without the lock on the first attempt, it will be able to mostly catch up to the other nodes in the cluster. On the second attempt it will forcibly obtain the lock and sync any transactions it missed in that short window and join, only blocking writes for a short time.
Finally, if your workload consists of a lot of small transactions and higher throughput (in terms of writes per second) is more important than transaction latency, consider batching your smaller transactions into fewer larger transactions. Larger transactions will reduce the cluster overhead required to commit the data on all of the nodes.
This isn’t an exhaustive list. Your use case may be different. But ideally these examples get you thinking about some different aspects you should consider. Ultimately these trade-offs are business decisions to weigh based on your workload, requirements, and budget.
Though Stardog doesn’t impose substantial load on ZooKeeper, it is an essential component for Stardog Cluster. ZooKeeper helps manage locks for transactions and joining nodes. It also stores the latest transaction ID for the cluster, ensuring that all the nodes are synchronized. Lastly, it determines which nodes are members of the cluster. If a Stardog node loses its connection to ZooKeeper, it is dropped from the cluster.
Therefore, ZooKeeper shouldn’t be neglected. You should follow the best practices for ZooKeeper outlined in its docs. In particular, each ZooKeeper process should run on its own server (and not share a server with Stardog processes) to minimize resource contention. The network between Stardog Cluster and the ZooKeeper ensemble should be low-latency and reliable (e.g., a LAN in the same data center, not a WAN). Finally, depending on your workload, ZooKeeper also suggests placing the data directory and log directory on separate disks.
AWS has too many options to cover in a single post. Here are few basic options to consider for instance types and volumes.
Instance types: AWS broadly categorizes instance types into 5 groups: general purpose, compute optimized, memory optimized, accelerated computing, and storage optimized.
If you have a large data set or perform complex queries that depend
on more RAM then
memory optimized instances (e.g.
r4.*) may offer the best price to performance
ratio. If your cluster needs to handle small, frequent transactions from
concurrent clients then additional vCPUs provided by compute optimized instances
c4.*) may better suit your deployment. Each write request spawns a
handful of threads to manage intra-cluster communication for
each transaction. Too few cores can negatively impact performance.
Volumes: AWS has a number of different volume
although you probably only need to consider
gp2 volumes are cheaper but they do not guarantee consistent performance. They
typically offer reasonable baseline performance with the ability to burst to
higher IOPS (input output per second) for short periods of time.
maintain “IO credits” in what AWS refers to as a burst bucket. Essentially the
credits limit the number of IOPS your volume can sustain and for how long. This
can lead to unpredictable performance and difficult-to-debug slowness or even
crashes in Stardog Cluster, especially if the cluster is under load for a long
time (e.g. hours+). These volumes typically suffice for deployments that don’t
experience sustained IO, such as development environments and short-running
io1 volumes offer provisioned IOPS, providing consistent performance. A volume
provisioned with 5K IOPS will be able to maintain that level of performance for
io1 volumes are more expensive than
gp2 volumes (depending on
their size and IOPS) so they are better suited for long-running load or
benchmark tests as well as production environments where consistent performance
is a strict requirement. Given that ZooKeeper is also sensitive to disk latency
and can quickly destabilize if disk performance slows,
io1 volumes with
sufficient IOPS should also be used for ZooKeeper in deployments where
reliability is important.
Other clouds offer similar options for instance types and volumes, although they will have different names. Make sure to investigate memory or compute optimized instance choices and use a volume that guarantees consistent performance with sufficient IOPS.