Announcing Stardog 5.3

By , · 4 minute read

Stardog 5.3 includes 5x cluster performance improvement and S3 backups. Read on for the details.

We’re happy to announce the latest release of Stardog, which brings new capabilities: S3 backups, cluster performance improvements, and a new query evaluation extension point.

Up to 5x Cluster Perf Increase

Stardog’s cluster is a strongly consistent system based on synchronous replication. Essentially we delegate commits to an elected coordinator which waits for successful replication of each transaction to other members of the cluster. Despite performing replication to independent nodes in parallel, this was still the biggest bottleneck in use cases with small transactions. The smaller the transaction, the more overhead the old replication protocol imposed.

After analyzing further, we identified two potential improvements: reducing the size of the messages and reducing the number of messages. This analysis spanned many different components in the system ranging from the HTTP client to the authentication layer to the cluster commit protocol. We were able to significantly reduce both the size and number of messages resulting in optimized cluster performance for small transactions, i.e., those composed of a few thousand triples or less. This increase is significant: up to 5x in some cases. Systems with larger transactions should also experience a noticeable performance increase, which tails off as the transaction size approaches 100k.

No code or configuration changes are necessary to take advantage of this speed-up. Test your cluster with Stardog 5.3 to experience it for yourself.

S3 Backup & Restore

Backups are indispensable and even more so for critical data. We all know it. But putting a disaster recovery plan into place isn’t the most exciting way to spend time. This support includes any S3 implementation. With it you can reduce the maintenance burden of disaster recovery processes by using S3 as a safe place to store backups.

With Stardog 5.3, we are introducing native support for direct S3 backup. This eliminates any room for error in using shell scripts or other glue code or manual processes. This seamless integration works by specifying an S3 URL instead of a filename:

stardog-admin db backup \
  --to "s3:///mybucket/backup/prefix?region=us-east-1&AWS_ACCESS_KEY_ID=accessKey&AWS_SECRET_ACCESS_KEY=secret" \
  myDb

The same works with the db restore command.

S3 provides a reliable off-site storage service which is perfect for backups. Don’t shirk the responsibility of keeping your data safe.

Customizing SPARQL’s DESCRIBE

SPARQL DESCRIBE query form is used to collect data related to a graph node. For example, to see what we know about Tom Hanks:

DESCRIBE :Tom_Hanks

might return:

:Tom_Hanks a :Person ;
	:actedIn :Forest_Gump .

In general, SPARQL is a declarative query language with formally specified semantics. The DESCRIBE query form, however, is not specified and the details are left to be provided by the implementation. Stardog’s default implementation prior to 5.3 returns a set of statements where the resource being described is the subject. With Stardog 5.3, it’s now possible to create your own implementation of DESCRIBE and choose between implementations for each query.

In 5.3 implementations of DESCRIBE are called strategies. They are written in Java by implementing the com.complexible.stardog.plan.describe.DescribeStrategy interface. Implementations are registered using the Java service loader by declaring the implementation of this interface. Strategies are identified by a name and are chosen at runtime with the new describe.strategy query hint like so:

#pragma describe.startegy bidirectional

DESCRIBE <theResource>

The default describe strategy can also be set per database using the query.describe.strategy option.

We’ve included two new describe strategy implementations: cbd and bidirectional. The CBD - Concise Bounded Description strategy is a well-known specification for the description of a resource. The bidirectional strategy augments the default strategy by adding triples where the resource being described is the object of the triple.

This new facility provides a very specific way to extend Stardog which can remove the need to compute descriptions externally. Customers benefit by keeping business logic centralized and consistent.

Summary

We aim in all things to provide value to customers who are building knowledge graphs. Sometimes this requires low-level systems engineering to meet performance targets and other times it requires devops tools necessary to run a reliable deployment with a minimum amount of intervention. Or we can extend graph query evaluation with new features like GraphQL and PATH queries and now with DESCRIBE strategies.

Check out Stardog 5.3 today–including the full Release Notes–and let us know what you think.


Top