Announcing Stardog 7

Kendall Clark

Aug 4, 2019, 5 minute read

Today we’re happy to announce the GA release of Stardog 7, including new low-level storage engine based on RocksDB. Read on for the glorious details.

Up to 20x Write Performance Improvements!

We first told you about Mastiff, our new low-level storage engine, two years ago when we launched development. Today we’re releasing it in Stardog 7 and we’re really excited. Stardog 7 is full of new features, but first let’s talk about the upsides of our new storage engine:

Stardog 7 is dramatically faster, typically between 10x and 20x improvement for writes.
For example, with a 1 billion nodes and edges database, with 10 concurrent transactions, Stardog 7 is 18 times faster than Stardog 6.
Not only is it faster, but even when a database is being updated with very large transactions that might run for hours, smaller transactions are not blocked and will complete in milliseconds.

Write throughput for single nodes is faster; but cluster writes are even faster still. Stardog 7 was designed specifically for our horizontally scalable cluster: thus, it can handle many simultaneous clients all writing at once without any of them blocking one another. Stardog 7’s new concurrent write perf improvements really show up in HA Cluster usage, which is the case that really counts for the enterprise.

In short, Stardog 7’s most obvious enterprise benefit is using it directly in write-heavy use cases like IoT, streaming sensors, and other cases where write pressure is a fact of life.

We’re looking forward to feedback on Stardog 7; for more background check out our pre-alpha post, and for some more details about the performance, we dug into this in the beta1 and beta2 announcements.

Virtual Transparency

But we didn’t stop there…We think Stardog’s unique combination of graph, storage, and virtualization is a data management game changer. Let me tell you a little story about the history of IT so that Virtual Transparency makes sense. I think you’ll be as excited about it as we are.

There are two big trends in IT: mobile and virtualization. We all know what mobile is. But what do I mean by virtualization? I mean what the cloud, storage-area networks, and Kubernetes all have in common. Each of them virtualizes some part of the IT landscape: the cloud virtualizes compute; SANs virtualize storage; Kubernetes virtualizes the data center. In each case we act as if there are infinite CPUs and disks and cluster nodes that we can use for our purposes, whereas in reality we really have no idea what’s really there. And that not knowing, what we can’t see, is part of the trick and part of the magic.

But what technology pulls this magic trick for data? What virtualizes data? What makes data location, format, structure, version number, compatibility layer, etc.–what makes all of it go poof?! We think the answer is Knowledge Graph; it’s the only technology that can unify data at scale and make it seem as if there’s one big “go to” database for all your queries.

Now that’s the world in which I want you to think about Stardog 7’s Virtual Transparency capability, since it’s the completion of Stardog’s most basic data management premise, that is, data location is almost always irrelevant. What the Enterprise needs is answers to questions based on its data. Period, the end.

Virtual Transparency in Stardog 7 lifts a restriction on how Virtual Graphs have worked to date; until Stardog 7, when you wrote a query for Stardog that used VG capabilities, you had to encode some location information in the query. It’s a subtle point, but all subtleties matter at scale. So when data locations change–and data locations always change–those queries had to be adjusted. That’s a restriction that we couldn’t live with.

In Stardog 7, queries use Virtual Graphs transparently and Stardog automatically figures out which virtual graph sources and mappings to use to answer queries. And if you have some requirements for encoding location, you’ll still be able to do that. But in Stardog 7 you won’t have to and that means the Enterprise can fully virtualize data. Combining this with the Kubernetes-scaled VG caching we introduced in Stardog 6.2, we think we’ve built the most powerful data virtualization capability in the business.

Schema Multi-tenancy

The last big ticket item in Stardog 7 is schema multi-tenancy. Increasiningly our customers are using Stardog to connect up the fragmented data archipelagos inside their orgs, and this means systematic data and schema extensibility and reuse. All the data really is connected and with Stardog that means, if you are a global manufacturer, for example, leveraging the Stardog-based predictive maintenance solution to complement a divisional GDPR solution.

To support this kind of curve-bending ROI, we’re lifting another restriction and allowing multiple schemas per database in Stardog 7. This will mean that different use cases, orgs, lines of business, and apps can share and reuse connected data hosted in Stardog 7 without stepping on each other’s toes or, just as crucially, without requiring a single schema to rule all the others. That’s not realistic and now it’s not required.