In the Network Programming chapter, we looked at how to interact with Stardog over a network via HTTP and SNARL protocol.
In this chapter we describe how to program Stardog from Java using SNARL ("Stardog Native API for the RDF Language"), Sesame, and Jena. We prefer SNARL to Sesame to Jena and recommend, all other things being equal, them in that order.
If you're a Spring developer, see the Programming with Spring chapter.
The best way to learn to program Stardog with Java is to study the examples:
- Sesame bindings
- Jena bindings
- SNARL and OWL 2 reasoning
- SNARL and Connection Pooling
- SNARL and Searching
We offer some commentary on the interesting parts of these examples below.
StardogDBMS provides simple programmatic access to all administrative functions available in Stardog.
Creating a Database
You can create a basic temporary memory database with Stardog with one line of code:
You can also use the
disk functions to configure and create a database in any way you prefer.
These methods return
objects which you can use to configure the options of the database you'd like to create. Finally, the
takes the list of files to bulk load into the database when you create it. This returns a valid
can be used to create new
Connections to your database.
It is important to note that, as shown in the example, you must take care to always log out of the server when you are done working with StardogDBMS.
This illustrates how to create a temporary memory database named 'test' which supports full text search via Waldo.
This illustrates how to create a persistent disk database with ICV guard mode enabled at the QL reasoning type. For more information on what the available
set are and what they mean, see the refer to the admin docs, specifically the chapter on administing a database.
Also note, Stardog database administration can be performed from the command line.
This class and its constructor methods are used for all of
Stardog's Java APIs: SNARL (native Stardog API), Sesame, Jena, as well
as HTTP and SNARL protocol. In the latter cases, you must also call
url() and pass it a
valid URL to the Stardog server using the HTTP or SNARL protocols.
Without the call to
url(), your ConnectionConfiguration
will attempt to connect to a local, embedded version of the Stardog server.
The Connection still operates in the standard client-server mode, the only
difference is that the server is running in the same JVM as your
application. You can use the convenience methods on StardogDBMS to start and
stop the embedded server.
Note: Whether using SNARL, Sesame, or Jena, most (perhaps all)
Stardog Java code will use
ConnectionConfiguration to get a
handle on a Stardog database—whether embedded or remote—and,
after getting that handle, can use the API that makes the most sense for the
use cases and requirements at hand.
We extensively discuss the security sytem in Stardog in the security section.
Shiro is used internally as the core of the security framework, but unlike previous versions, you do not need to configure Shiro directly. All management can be done via the command-line or via the security API provided by StardogDBMS
In the examples (1) and (4) above, you can see how to use SNARL in Java to interact with Stardog. The SNARL API will give the best performance overall and is the native Stardog API. It uses some Sesame domain classes but is otherwise a clean-sheet API and set of implementations.
The SNARL API is fluent with the aim of making code written for Stardog easier to write and easier to maintain. Most objects are easily re-used to make basic tasks with SNARL as simple as possible. We are always interested in feedback on the API, so if you have suggestions or comments, please send them to the mailing list.
Let's take a closer look at some of the interesting parts of SNARL.
You must always surround changes to a database with a transaction begin and commit. Changes are kept locally until the transaction is committed, or you try and perform a query operation to inspect the state of the database within the transaction.
By default, RDF added will go into the default context unless specified otherwise. As shown, you can use
Adder directly to add statements and graphs to the database,
and if you want to add data from a file or input stream, you use an
stream() chain of method invocations.
See the SNARL API Javadocs for all the gory details.
SNARL also supports some sugar for the classic statement-level
getSPO() scars, anyone?) interactions. We ask in the first
line of the snippet above for an iterator over the Stardog connection,
aURI in the subject position. Then a while-loop, as
one might expect...
You can also parameterize
Getters by binding different
positions of the
Getter (which acts like a kind of RDF
statement filter)—and the iterating as usual.
aIter.close() which is important for
Stardog databases to avoid memory leaks.
If you need to materialize the iterator as a graph, you can do that by
The snippet doesn't show
parameters on a
Getter, but those work, too, in the obvious
Parameterized SPARQL Queries
SNARL also lets us parameterize SPARQL queries.
We can make a
Query object by passing a SPARQL query in the constructor. Simple. Obvious.
Next, let's set a limit for the results:
or if we want no limit,
aQuery.limit(Query.NO_LIMIT). By default, there is no limit imposed on the query object; we'll use whatever is specified in the query.
But you can use limit to override any limit specified in the query, however specifying NO_LIMIT will not remove a limit specified in a query, it will only remove
any limit override you've specified restoring the state to the default of using whatever is in the query.
We can execute that query with
iterate over the results. We can also rebind the "?s" variable easily:
aQuery.parameter("s", aURI), which will work for all instances
of "?s" in any BGP in the query, and you can specify
null to remove the binding.
Query objects are re-useable, so you can create one from your original query string and alter bindings, limit, and offset in any way you see fit and re-execute the query to get the updated results.
We strongly recommend the use of SNARL's parameterized queries over concatenating strings together in order to build your SPARQL query. This latter approach opens up the possibility for SPARQL injection attacks unless you are very careful in scrubbing your input.
Let's look at removing data via SNARL;
in the example above, you can
see that file or stream-based removal is symmetric to file or stream-based
addition, i.e., calling
remove() in an
with a file or stream call. See the SNARL API docs for more details about
finer-grained deletes, etc.
Stardog supports query time OWL 2 QL, EL, and RL
reasoning by using a query rewriting technique.
Connection layer and then any queries
executed over that connection are executed with reasoning enabled;
you don't need to do anything up front when you create your database if you want to use reasoning.
In this code example, you can see that it's trivial to enable reasoning
Connection: simply call
the appropriate constant (such as
ReasoningType.QL) passed in. In
addition to OWL2 QL, EL, and RL, Stardog supports OWL2 DL schema queries. Stardog also supports SWRL.
For more information on how reasoning is supported in Stardog, check out the reasoning section.
We introduced a search system into Stardog 0.6.5; it can be used from the command line or remotely over the network interface. It can also be used from Java in the following way.
The fluent Java API for searching in SNARL looks a lot like the other
search interfaces: We create a
Searcher instance with a
limit() sets a limit on the results;
query() contains the search query, and
sets a minimum threshold for the results.
Then we call the
search() method of our
Searcher instance and iterate over the results
SearchResults). Last, we can use
offset() on an existing
Searcher to grab another
page of results.
Stardog also supports performing searches over the full-text index within a SPARQL query via the LARQ SPARQL syntax.
This provides a powerful mechanism for querying both your RDF index and full-text index at the same time while also giving you a more performant option to the SPARQL
SNARL Connection Views
As of 0.7, SNARL
support obtaining a specified type of Connection. This provides the ability to extend
and enhance the features available to a Connection while maintaining the standard,
simple Connection API. The Connection
as method takes
as a parameter the interface, which must be a sub-type of a Connection, that you
would like to use.
as will either return the Connection as the view
you've specified, or it will throw an exception if the view could not be obtained
for some reason.
An example of obtaining an instance of a
to use Stardog's full-text search support.
SNARL API Docs
Please see SNARL API docs for more information.
Stardog supports the Sesame API; thus, for the most part, using Stardog and Sesame is not much different from using Sesame with other RDF databases. There are, however, at least two differences worth pointing out.
Wrap the connection with
As you can see from the code snippet, once you've created a
with all the details for connecting to a Stardog database, you can wrap that in a
which is a Stardog specific implementation of the Sesame
Repository interface. At this
point, you can use the resulting Repository like any other Sesame Repository implementation. Each time
Repository.getConnection, your original ConnectionConfiguration will be used
to spawn a new connection to the database.
Stardog's RepositoryConnection implementation will, by default, disable
When enabled, every single statement added or deleted via the Connection will incur the cost
of a transaction, which is too heavyweight for nearly every use case. You can enable
and it will work as expected, but for the aforementioned reason, and the fact that Sesame has deprecated
the method, we recommend leaving it disabled.
Stardog supports Jena via a Sesame-Jena bridge, so it's got more overhead than Sesame or SNARL. YMMV. There two points in the Jena example to re-iterate.
Init in Jena
The initialization in Jena is a bit different from either
SNARL or Sesame; you can get a Jena
by passing the
Connection instance (returned by
ConnectionConfiguration) to the Stardog factory,
Add in Jena
Jena also wants to add data to a
Model one statement
at a time, which can be less than ideal. To work around this restriction,
we recommend adding data to a
Model in a single Stardog
transaction, which is initiated with
Then to read data into the model, we recommend using RDF/XML, since
that triggers the
BulkUpdateHandler in Jena or grab a
BulkUpdateHandler directly from the underlying Jena graph.
The other options include using the Stardog command-line client to bulk load a Stardog database or to use SNARL for loading and then switch to Jena for other operations, processing, query, etc.
As you can see, using Stardog from Java in either embedded
or client-server mode is very similar—the only
really visible difference is the use of
url() in a
ConnectionConfiguration: when it's present, we're in
client-server model; else, we're in embedded mode.
That's a good and a bad thing: it's good because the code is symmetric and uniform. It's bad because it can make reasoning about performance difficult, i.e., it's not entirely clear in client-server mode which operations trigger (or don't trigger) a round trip with the server and, thus, which may be more expensive than in embedded mode.
In client-server mode, everything triggers a round trip with these exceptions:
- closing a connection outside a transaction
- any parameterizations (or other) of a query or getter instance
- any database state mutations in a transaction that don't need to be immediately visible to the transaction; that is, changes are sent to the server only when they are required, on commit, or on any query or read operation that needs to have the accurate up-to-date state of the data within the transaction.
Stardog generally tries to be as lazy as possible; but in client-server mode, since state is maintained on the client, there are fewer chances to be lazy and more interactions with the server.
In addition to the
url() issue, the other key difference
between client-server and embedded Stardog is, of course, Java classpath
woes. As of 1.2.2, there is one classpath issues to watch
- if you're using Jena in embedded mode, then Jena's libraries should be on the classpath after Stardog's, because of conflicting Lucene JARs
Please let us know if you find any other conflicts among JARs or other classpath issues.
Stardog supports connection pools for SNARL
objects for efficiency and programmer sanity. Here's how they work:
Per standard practice, we first initialize security and grab a connection, in this case to the testConnectionPool database.
Then we setup a
ConnectionPoolConfig, using its fluent
API, which establishes the parameters of the pool:
- Sets which ConnectionConfiguration we want to pool; this is what is used to actually create the connections.
- Establishes min and max pooled objects; max pooled objects includes both leased and idled objects.
- Sets the idle life of objects; in this case, the pool reclaims objects idled for 1 hour.
- Sets the max time (here: in minutes) that we'll block waiting for an object when there aren't any idle ones in the pool.
Whew! Next we can
Finally, we call
obtain() on the
ConnectionPool when we need a new one. And when we're done
with it, we return it to the pool so it can be re-used, by calling
release(). When we're done, we
Since reasoning in Stardog is enabled per
Connection, you can create two pools: one with reasoning
connections, one with non-reasoning connections; and then use the one you
need to have reasoning per query; never pay for
more than you need.
Methods and classes in SNARL API that are marked with the
com.google.common.annotations.Beta are subject to change or removal in any release. We are using this annotation
to denote new or experimental features, the behavior or signature of which
may change significantly before it's out of "beta".
We will otherwise attempt to keep the public APIs as stable as
possible, and methods will be marked with the standard
annotation for a least one full revision cycle before their removal from the
@VisibleForTesting is just that, visible as a consequence of test case requirements; don't write any important code that depends on functions with this annotation.