Stardog on Kubernetes
Get the latest in your inbox
Get the latest in your inbox
Deploying Stardog Cluster on Kubernetes is a walk in the park. In this post we show you how easy it is and how it works.
Now you can deploy Stardog Cluster on Kubernetes with a single command:
kubectl apply -f stardog-cluster.yaml
Once the pods are running, you’ll have a three node Stardog Cluster behind a load balancer ready to go.
We’ll go through the specifics of what this command does (and the contents of the stardog-cluster.yaml file) a bit later in the post, but here’s a brief overview. This yaml file first instructs Kubernetes to create a namespace on Kubernetes so we don’t stomp on the default one:
apiVersion: v1
kind: Namespace
metadata:
name: stardog-k8s
Namespaces allow you to scope your objects (pods, services, etc.) within a single physical cluster. It’s not required that we create a namespace, but it’s generally good practice to avoid using the default one unless it’s a small development cluster.
After the namespace is created, it adds the credentials for Artifactory (so it can download the Stardog Docker image) and a Stardog license:
apiVersion: v1
kind: Secret
metadata:
name: stardog-artifactory-credentials
namespace: stardog-k8s
data:
.dockerconfigjson: <base64 encoded string of credentials>
type: kubernetes.io/dockerconfigjson
---
apiVersion: v1
kind: Secret
metadata:
name: stardog-7-license
namespace: stardog-k8s
type: Opaque
data:
stardog-license-key.bin: <base64 encoded string of the license file>
Once the namespace is created and the secrets are in place, the cluster is deployed. ZooKeeper is deployed and configured first, followed by the Stardog Cluster. Finally, it creates a service with an externally accessible load balancer that exposes Stardog Cluster. The command works with the Kubernetes platforms we’ve tested, including Amazon EKS, Google Kubernetes Engine, and Microsoft Azure Kubernetes.
Now, let’s take a step back and talk about Kubernetes: what it is, why you may want to use it, and how we deploy Stardog on it.
Kubernetes is a container orchestration platform to help users deploy, manage, and scale containerized applications. Kubernetes refers to a set of containers as a pod. Pods can be a confusing concept at first, as they may consist of multiple containers. If you want to read more about pods you can do so in the Kubernetes docs. For our purposes, it’s easiest to think of each ZooKeeper container and each Stardog container as individual pods.
Kubernetes automatically deploys pods on hosts and monitors them to ensure that they remain running, restarting them if necessary (possibly on other hosts in the Kubernetes cluster). For example, if an underlying host crashes, Kubernetes automatically starts the pods from the bad host on a working host in the cluster. Or if a host needs to undergo maintenance, Kubernetes can drain the host and start the pods on another host. If a user decides they need an extra instance of their application, they can simply increment the count and Kubernetes takes care of the rest: it finds a host to deploy the pod, starts the pod, and performs any required configuration for it. If the application is behind a load balancer, Kubernetes automatically adds the pod to the load balancer and directs traffic to it once it’s online.
For simple stateless applications, such as a web application with state maintained in a database elsewhere, it’s obvious to see how Kubernetes can effectively manage and scale the web server deployment. Kubernetes doesn’t have to track anything specific about the containers, it can simply launch replacement containers on any host in the cluster when needed.
What about more complex, stateful applications, such as a database? This is a
little trickier, but Kubernetes has primitives that make it possible to run those
applications as well. Typically, stateful applications require a few
guarantees: a consistent name, ordered deployments (if multiple services), and
persistent storage. If a host goes down, Kubernetes needs to make sure the stateful
containers on the new host keep the same name and underlying storage, regardless of
where they’re restarted in the cluster. StatefulSets
do exactly that,
which is what we use for both ZooKeeper and Stardog containers in our deployment.
There are a number of getting started guides for Kubernetes, depending on which platform you want to use: Google Kubernetes Engine, Microsoft Azure Kubernetes, or Amazon EKS, among others. Of course, you can also deploy your own Kubernetes cluster manually. However, choosing one of the managed platforms is typically a good starting point.
You’ll also need kubectl
on your system and a kube config file for your particular
Kubernetes cluster. The kube config file is what specifies the Kubernetes cluster to use
and the credentials for it.
We’ll leave the details of setting up a Kubernetes cluster and configuring
kubectl
to the getting started guide for the platform of your choice.
Once you’re setup for a Kubernetes cluster, let’s dig into the details of the Stardog HA Cluster deployment.
The full stardog-cluster.yaml file is available in the stardog-examples GitHub repository. The file includes everything except the secrets: base64-encoded strings for Artifactory credentials and a Stardog license.
You can encode your Stardog license with:
base64 stardog-license-key.bin
Replace <base64 encoded string of the license file>
in stardog-cluster.yaml
with the
base64 string.
There are a number of ways to add Artifactory credentials to Kubernetes, however, if you have
any special characters in your password, the easiest way is to create a
artifactory-credentials.json
file with your username and password in the following json:
{
"auths": {
"https://complexible-eps-docker.jfrog.io": {
"username":"put-username-here",
"password":"put-password-here
}
}
}
And then base64 encode it as well:
base64 artifactory-credentials.json
Replace <base64 encoded string of credentials>
in stardog-cluster.yaml
with the base64 string.
As we mentioned earlier, we first create a namespace for our deployment,
called stardog-k8s
. Everything else is configured and deployed into that namespace.
Namespaces provide a scope for objects in a deployment (services, pods, etc.) within a single
physical Kubernetes cluster. There are no hard and fast rules that you must follow for
namespaces. Kubernetes provides a default
namespace that may work for small development
clusters with a few users. However, as your use of Kubernetes grows, you may choose to consider
creating separate namespaces for different teams or efforts within your organization.
At Stardog we don’t use the default
namespace, instead developers create their own namespaces for testing
and larger efforts are organized into their own namespaces as well. By default, kubectl
uses the default
namespace, therefore, if you want to see objects created in another namespace you must specify it, e.g.:
kubectl -n stardog-k8s get pods
This makes our lives a bit harder, but it also means that we are, as a vendor, less likely to clash names with customer deployments or environments. You can also set your preferred namespace
in your kubeconfig file so you don’t have to specify it for each kubectl
command.
The stardog-cluster.yaml
file has a number of different sections worth discussing. It creates services for both Stardog and ZooKeeper. It contains a ConfigMap
object that specifies the stardog.properties
necessary to enable the cluster. The two major sections are
the StatefulSets
for the Stardog and ZooKeeper nodes. The ZooKeeper section largely mirrors a Kubernetes template for ZooKeeper deployments from the Kubernetes GitHub project.
The Stardog section starts with the following:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: stardog-cluster
namespace: stardog-k8s
We use a StatefulSet
for Stardog and ZooKeeper pods, allowing the pods to maintain consistent DNS names, regardless of which host they’re on. With StatefulSets, each pod is assigned an ordinal index (e.g., zk-0
, zk-1
, and zk-2
), that is consistent across restarts. The stardog.properties
configuration file takes advantage of this by specifying the 3 ZooKeeper servers ahead of time using their names, e.g., zk-0.zk-service.stardog-k8s
. The StatefulSet also ensures that the underlying data volume
for ZooKeeper and Stardog stay with the pod throughout the cluster.
replicas: 3
instructs Kubernetes to create a 3 node Stardog Cluster and to keep those 3 nodes running. If you need a bigger cluster, just specify a bigger value for replicas
. The podAntiAffinity
section tells Kubernetes to deploy the 3 Stardog nodes on different hosts in the Kubernetes cluster so a single host failure will only result in a single Stardog container failing.
Because the stardog-cluster.yaml
contains both the ZooKeeper nodes and Stardog nodes, we use an initContainer
for the Stardog nodes that forces them to wait until ZooKeeper is ready before they start. Finally, the deployment also creates a load balancer and exposes
an external address for the Stardog Cluster. You can list the services in the stardog-k8s
namespace:
kubectl -n stardog-k8s get svc
The external Stardog service lists an EXTERNAL-IP
you can use to reach the cluster:
stardog-admin --server http://<external-ip>:5820 cluster info
The liveness probe for Stardog pods uses the Stardog health check to determine if the Kubernetes load balancer should route requests to Stardog nodes (and that they are otherwise functioning pods):
livenessProbe:
httpGet:
path: /admin/healthcheck
port: 5820
initialDelaySeconds: 30
periodSeconds: 5
The health check waits 30 seconds once the container is created and then queries the Stardog health check endpoint every 5 seconds to verify that the node is still a cluster member.
Here are a few additional commands to help you examine the Kubernetes objects created.
List the namespaces in your Kubernetes cluster:
kubectl get namespaces
List the pods in your namespace:
kubectl -n stardog-k8s get pods
View the logs for a specific pod:
kubectl -n stardog-k8s logs zk-0
Connect into a pod in the cluster:
kubectl -n stardog-k8s exec -it stardog-cluster-0 -- sh
List the services (and their IPs and DNS names):
kubectl -n stardog-k8s get svc
To wrap up, let’s be a good user of our Kubernetes cluster and delete the objects we created. Luckily, deleting all of this is just as easy as creating it. Instead of running kubectl apply
, we can run kubectl delete
:
kubectl delete -f stardog-cluster.yaml
This command will delete everything, including the persistent volumes and namespace.
However, if you ever just delete the Stardog and ZooKeeper StatefulSets
(and not the credentials or namespace), Kubernetes will keep the persistent volumes so the data won’t be lost if the pod dies or is otherwise removed. You can list the persistent volume claims with the command below. Remember to delete each of those when you’re done since there is an underlying physical volume associated with each claim (e.g., an EBS volume on AWS that is costing you money):
kubectl -n stardog-k8s get pvc
kubectl -n stardog-k8s delete pvc zk-data-zk-0
...
<delete the remaining ZooKeeper and Stardog Cluster volumes>
...
kubectl -n stardog-k8s delete pvc stardog-cluster-data-stardog-cluster-2
With Kubernetes, Stardog Cluster has never been easier to deploy and manage. Kubernetes provides powerful mechanisms to handle scheduled infrastructure maintenance and recover automatically from infrastructure failures. In future Kubernetes posts, we’ll cover some different failure scenarios and show how Stardog Cluster on Kubernetes adapts in the face of failure. We’ll also be looking at how can use all this elastic computing power to scale Stardog Knowledge Graph services. Stay tuned.
How to Overcome a Major Enterprise Liability and Unleash Massive Potential
Download for free