Reactive, Streaming Stardog Kernel
Get the latest in your inbox
Get the latest in your inbox
The Stardog kernel API is morphing into a system based asynchronous, reactive streams; in this post we discuss motivations and design goals.
If you’ve programmed against Stardog, you’ve had to use at least a bit of the
Stardog Native API for the RDF Language (SNARL API). Probably the most oft-used
bit is the Connection
class. Patterned after JDBC connections, Connection
is
the means of interacting with a Stardog database within an application. It’s
very stable: only 28 non-javadoc changes to the interface since it was
introduced before Stardog 1.0; none in the last 18 months.
But SNARL is showing its age. The pleasant world of lambdas, try-with-resources,
or just using String
in a switch statement is now the actual world. We’ve been
thinking about where Stardog is going in 2017; you might
have
read about it recently as
we recapped an exciting 2016. We unify all the
data:
Structured,
unstructured, and
everything in between. As we reconsider our design, and plan for future growth,
I thought it was time to take a new look at SNARL.
Let’s set aside the issues of changing a public API first. It’s not something we decided easily; we take compatibility seriously. But Stardog is undergoing fundamental changes, and some of those changes will be directly reflected in API like SNARL.
At the very least, there’s a lot to be gained from leveraging Stream
and
lambda. We weakly embraced this the last time we changed the API. But a lot has
changed in the Stardog world too. When the Connection
class and the rest of
the SNARL API was originally created, we were months from our first public
release.
SNARL was created very early on, before stuff like disk-based databases, the reasoner, CLI, security, even transactions, existed. So there it was, the public face of Stardog for years to come, built when the system was not much more than a query engine and memory-based indexes that were simple, sorted arrays. And we’ve hardly changed it since.
I first heard about reactive programming not from the manifesto but from some binge watching/listening of Erik Meijer videos. If you haven’t watched any of Erik Meijer’s talks, you should. You’ll learn something, and you’ll definitely laugh. This talk about reactive a few years ago was my introduction.
I’ve seen it referred to as the “Observer pattern done right”, which is fair. It
is basically Observer Pattern. The primary difference is that it adds
error and completion semantics. You can also just think about it as
Iterable
. The only difference is it’s push vs pull, and instead of the
barebones Iterable
API, you get Stream
plus a whole lot more.
But for me, it was easier to just think about it as a way to work with streams
of data. Consider a click stream from a mouse. These streams are a simple sequence
of events: Value
, Error
, and Complete
. Error
and Complete
are actually
mutually exclusive. Things worked, or they didn’t, but not both. It’s not rocket
science. You can get zero or more Value
s before the termination event (i.e.
Error
or
Complete
). This is a
great explanation of reactive principles in terms of a click stream.
Basic usage of a reactive API would look very similar to the Java Stream API:
getResults()
.skip(10)
.take(5)
.map(Object::toString)
.subscribe(System.err::println)
Apply an offset and limit, turn the Value
s into String
s and print them out;
subscribe
in this case is akin to forEach
. Add
scheduling,
and
backpressure,
and it’s easy to see how this approach can provide the building blocks for a
powerful and flexible API.
Here are a few other links that I found useful when trying to grok this stuff:
This is really two questions:
Future
?,Future
?One downside to going with the reactive pattern is that it’s not as widely known
as, say, Future
. That won’t help the learning curve. And we have always tried
to maximize developer happiness and productivity because after all, we’re
developers too, and we look after our peeps!
Java 8 improves Future
. Before we switched from Java 6 to Java 8, I would have
told you that Future
is not easily composable and that doing error handling is
cumbersome. The changes introduced in Java 8 go a long way to improving this.
Future
also gives us some of the lazy/asynchronous joy we can get out of
reactive, especially if we’re just waiting on a single result. For example, did
dropping the database work or not? Future<Void>
is perfect.
However, it’s easy to kill some of that joy by calling Future#get
too early.
If you do that, you’re stuck waiting. You can use callbacks to help;
Guava has
some nice support for this, but callbacks can get out of control
quickly.
There’s also the problem that Future
is single-valued. So do you use
Future<Collection<T>>
or Collection<Future<V>>
? Probably the latter, but if
the first one in the collection we call Future#get
on isn’t done, we’re
blocked from getting results from anything else in the collection.
Scheduling is pretty straight-forward, but there’s no notion of backpressure:
you’ve got to roll your own. And what about enhancing the Future
, like
attaching a progress listener to the asynchronous task? You’ll need to bake that
into anything you want to keep tabs on.
My first attempts to sketch out some ideas based on Future
just felt
cumbersome. For what I had in mind, a reactive design just felt more intuitive
and natural to use.
Distributed systems are hard. One of the hardest things to internalize, and then design for is failure. Instead of running one node, you run three so there’s no single point of failure, but now there are a million more things that can go wrong. Fault tolerance, including finding ways to be partially available, must be a fundamental part of the system. And you need to keep a close eye on things, so when failures do occur, and they will, you can manually intervene if need be; replace failing nodes, add more capacity, etc.
This is a big change from a single node system. We’ve been building in, around, and on top of our existing API as the cluster has matured and as we’ve begun the shift from a vertically scaling system to a horizontally scalable one. One thing I felt was clear was that a lot of these concerns needed to be built into the system, rather than on top of it.
We’ve been using Apache Curator for the cluster, which is something that grew out of the Netflix Open Source treasure trove. So I knew about Hystrix, and more generally the Circuit Breaker pattern.
According to the Hystrix wiki, Hystrix is designed to do the following:
Do we want any of that in Stardog? Check, check, and check. That’s a good place to start. It’s even got steps toward monitoring. This can be part of the glue between the different parts of Stardog, but it’s almost an implementation detail rather than something that’s part of the API. And it plays very nicely with RxJava. So now a couple different ideas all seem to be coming together to provide the building blocks for the next generation of the SNARL API.
The switch in design isn’t without drawbacks. I’ve already touched on the
familiarity of Future
versus the relative unfamiliarity of reactive APIs.
Other areas of concern are that extending Flowable
is cumbersome, but doable,
while Observable
is not. The distinction between hot and cold observables is
rarely obvious, you really have to pick one style and stick with it, because
let’s face it developers aren’t going to read the @implNote
in the docs to
find out which is which until after there’s a bug. And debugging can be an
exercise in interpreting more convoluted stack traces. Finally, understanding
what errors an exception “throws” is less clear. There’s no throws
declaration
to rely on as exceptions are reported to subscribers directly. You have to read
the javadocs to know what kinds of exceptions can be thrown.
But overall, it’s a very positive step forward for Stardog and something I’m personally very excited about.
This is the first in a two part series discussing the upcoming changes to the SNARL API. In the final installment we’ll be taking a closer look at the API itself.
Download Stardog today to start your free 30-day evaluation.
How to Overcome a Major Enterprise Liability and Unleash Massive Potential
Download for free