As discussed in a previous post Stardog Cloud relies on VolumeSnapshots in Kubernetes (k8s) for backups of user data. In this post we will go into more technical details of how to work with VolumeSnapshots in the Elastic Kubernetes Service (EKS).
Here we will presents the k8s components that are used when working with VolumeSnapshots. We do not go into exhaustive details here but rather briefly give an overview to ease in understanding the concepts in this post. Detailed documentation is linked to each type.
The abstraction that represents physical storage.
The type of storage from which a PV is created. For example a PersistenceVolumeClass can be from an AWS gp2 or io1. It can also be from a local disk with specific RAID options or an NFS partition.
A claim to a specific resource on a PV. The PVC consumes specific size and access modes and other characteristics of the PV. In order for a pod to mount a PV storage type it must first create a claim upon that PV. While in the course of mounting storage on k8s pods it operates as though the PVC is the volume while in fact the PVC itself does not represent any physical storage but a right to use physical storage.
This is akin to the PV in that it represents an actual snapshot of data on some physical storage. This is typically created from a PVC but can also be pre-provisioned.
The VolumeSnapshot relationship to the VolumeSnapshotContent is akin the PVC relationship to the PV. The VolumeSnapshotContent is the physical storage while the VolumeSnapshot is the interface to it. The VolumeSnapshot represents the request for a snapshot from a PVC and it maintains the status of creating the snapshot from the data held by the PVC onto the VolumeSnapshotContent. It also can be used as a source for creating a PVC from the contents of its bound VolumeSnapshotContent.
This is similar to the SnapshotClass in that it describes attributes of the VolumeSnapshotContent. The most important attribute in this case is the ebs.csi.aws.com driver.
When using EKS v1.17 snapshots to EBS must be made using the
ebs.csi.aws.com provisioner. In order to do this the PVCs from which the snapshots will be created must be associated with a StorageClass defined to use that provisioner. The reason for this is that provisioner is the driver that can interact with EC2 EBS services and thus the volumes in question must use it.
Stardog Cloud manages this by creating a StorageClass called
stardog-home. Any volumes that we wish to snapshot must be created from PVCs associated with this storage class. We define our class with the following:
kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: stardog-home provisioner: ebs.csi.aws.com volumeBindingMode: WaitForFirstConsumer parameters: type: gp2 allowVolumeExpansion: true
The important fields to note are the provisioner, the metadata name parameter type. This sets our StorageClass up to be named
stardog-home, for it to be managed by the
ebs.csi.aws.com provisioner, and for that to use the type
gp2. Descriptions of the other fields can be found elsewhere. The
parameters section is used to pass in options specific to the provisioner. In our case we are asking it to use
io1 would be another possible value here.
Similarly any snapshot that we make must also use the EBS CSI provisioner. In order to achieve this we must define a SnapshotVolumeClass as well. We use the following description:
apiVersion: snapshot.storage.k8s.io/v1beta1 kind: VolumeSnapshotClass metadata: name: csi-aws-vsc driver: ebs.csi.aws.com deletionPolicy: Delete
This basic description gives us a way to create VolumeSnapshotContent using the driver ebs.csi.aws.com. The name of this VolumeSnapshotClass will be csi-aws-vsc.
Once we have the above classes defined and a PVC created from the
stardog-home StorageClass we can create snapshots of that PVC. The process of creating a snapshot and the associated components is illustrated below:
In the same way that pods interface to physical storage via a PVC so do snapshots. A VolumeSnapshot of a specific class is created. This component is given a source PVC and it creates a VolumeSnapshotContent where it stores data.
To create a snapshot of a PVC named
pvc-1 we apply the following to the k8s cluster:
apiVersion: snapshot.storage.k8s.io/v1beta1 kind: VolumeSnapshot metadata: name: new-snapshot spec: volumeSnapshotClassName: csi-aws-vsc source: persistentVolumeClaimName: pvc-1
This creates a VolumeSnapshot object named
new-snapshot and it will bind it to a dynamically provisioned VolumeSnapshotContent object. The VolumeSnapshot can be inspected to see the status in the following:
kubectl -n buzztroll describe volumesnapshot new-snapshot Name: new-snapshot Namespace: stardog Labels: <none> Annotations: <none> API Version: snapshot.storage.k8s.io/v1beta1 Kind: VolumeSnapshot Metadata: Creation Timestamp: 2021-01-13T18:48:57Z Finalizers: snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection snapshot.storage.kubernetes.io/volumesnapshot-bound-protection Generation: 1 Resource Version: 100143385 Self Link: /apis/snapshot.storage.k8s.io/v1beta1/namespaces/stardog/volumesnapshots/new-snapshot UID: 6707c050-7b89-47c9-a795-152e3582ce86 Spec: Source: Persistent Volume Claim Name: pvc-1 Volume Snapshot Class Name: csi-aws-vsc Status: Bound Volume Snapshot Content Name: snapcontent-6707c050-7b89-47c9-a795-152e3582ce86 Creation Time: 2021-01-13T18:48:57Z Ready To Use: false Restore Size: 32Gi Events: <none>
The VolumeSnapshot is initially reported as not ready to use. That means that the VolumeSnapshotContent is still being written. Also not that the created VolumeSnapshotContent is called
snapcontent-6707c050-7b89-47c9-a795-152e358. We can inspect that k8s resource as well:
kubectl -n buzztroll describe volumesnapshotcontent snapcontent-6707c050-7b89-47c9-a795-152e3582ce86 Name: snapcontent-6707c050-7b89-47c9-a795-152e3582ce86 Namespace: Labels: <none> Annotations: <none> API Version: snapshot.storage.k8s.io/v1beta1 Kind: VolumeSnapshotContent Metadata: Creation Timestamp: 2021-01-13T18:48:57Z Finalizers: snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection Generation: 1 Resource Version: 100143641 Self Link: /apis/snapshot.storage.k8s.io/v1beta1/volumesnapshotcontents/snapcontent-6707c050-7b89-47c9-a795-152e3582ce86 UID: 53c603ff-e235-46ef-b59f-14d431cf4724 Spec: Deletion Policy: Delete Driver: ebs.csi.aws.com Source: Volume Handle: vol-008db636e12862813 Volume Snapshot Class Name: csi-aws-vsc Volume Snapshot Ref: API Version: snapshot.storage.k8s.io/v1beta1 Kind: VolumeSnapshot Name: new-snapshot Namespace: stardog Resource Version: 100143369 UID: 6707c050-7b89-47c9-a795-152e3582ce86 Status: Creation Time: 1610563737000000000 Ready To Use: true Restore Size: 34359738368 Snapshot Handle: snap-0f1f04f468660a027 Events: <none>
An interesting thing to note is the
Snapshot Handle value
snap-0f1f04f468660a027. That value is the reference to the EBS snapshot in the associated AWS account.
Stardog Cloud uses these k8s techniques to create snapshots. With backing of EBS performance enhancements this gives us a robust and performant backup solution. Checkout Stardog Cloud.
At Stardog we are continuously pushing the boundaries of performance and scalability. Last month’s 7.5.0 release brought 500% improvement to transactional write performance. This month’s 7.6.0 release improves writing data at database creation time by almost 100%, yielding a million triples per second loading speed using a commodity server. In this post we’ll talk about the details of loading performance. Let’s do the numbers The fastest way to load large amounts of data into Stardog is to do at database creation time.
Stardog 7.5.0 improves write performance up to 500% in some cases. In this post I describe the details of this improvement and share detailed benchmarking results for update performance. Large Updates A common usage pattern for Stardog involves connecting to external data sources through virtual graphs that are queried on-demand without storing any data in Stardog. However, in some cases you might enable virtual graph caching to pull data into Stardog for indexing and in some other cases it is preferable or even necessary to materialize the data in Stardog completely.
Current students, instructors, and staff at accredited academic institutions are eligible to receive a one-year Stardog license.Download now