Introduction
In the Network Programming chapter, we looked at how to interact with Stardog over a network via HTTP and SNARL protocol.
In this chapter we describe how to program Stardog from Java using SNARL ("Stardog Native API for the RDF Language"), Sesame, and Jena. We prefer SNARL to Sesame to Jena and recommend, all other things being equal, them in that order.
If you're a Spring developer, see the Programming with Spring chapter.
Examples
The best way to learn to program Stardog with Java is to study the examples:
- SNARL
- Sesame bindings
- Jena bindings
- SNARL and OWL 2 reasoning
- SNARL and Connection Pooling
- SNARL and Searching
We offer some commentary on the interesting parts of these examples below.
Creating & Administering Databases
StardogDBMS provides simple programmatic access to all administrative functions available in Stardog.
Creating a Database
You can create a basic temporary memory database with Stardog with one line of code:
You can also use the memory and
disk functions to configure and create a database in any way you prefer.
These methods return DatabaseBuilder
objects which you can use to configure the options of the database you'd like to create. Finally, the
create method
takes the list of files to bulk load into the database when you create it. This returns a valid
ConnectionConfiguration which
can be used to create new Connections to your database.
It is important to note that, as shown in the example, you must take care to always log out of the server when you are done working with StardogDBMS.
This illustrates how to create a temporary memory database named 'test' which supports full text search via Waldo.
This illustrates how to create a persistent disk database with ICV guard mode enabled at the QL reasoning type. For more information on what the available
options for set are and what they mean, see the refer to the admin docs, specifically the chapter on administing a database.
Also note, Stardog database administration can be performed from the command line.
Creating a Connection String
As you can see from all of the examples,
the ConnectionConfiguration (in com.clarkparsia.stardog.api package) class is
where the initial action takes place:
The to() method takes a Database Name (as a string);
and then connect()
actually connects to the database using all specified properties on the configuration.
This class and its constructor methods are used for all of
Stardog's Java APIs: SNARL (native Stardog API), Sesame, Jena, as well
as HTTP and SNARL protocol. In the latter cases, you must also call url() and pass it a
valid URL to the Stardog server using the HTTP or SNARL protocols.
Without the call to url(), your ConnectionConfiguration
will attempt to connect to a local, embedded version of the Stardog server.
The Connection still operates in the standard client-server mode, the only
difference is that the server is running in the same JVM as your
application. You can use the convenience methods on StardogDBMS to start and
stop the embedded server.
Note: Whether using SNARL, Sesame, or Jena, most (perhaps all)
Stardog Java code will use ConnectionConfiguration to get a
handle on a Stardog database—whether embedded or remote—and,
after getting that handle, can use the API that makes the most sense for the
use cases and requirements at hand.
See the ConnectionConfiguration
API docs or the administration section for more information on connection strings.
Security in Stardog
We extensively discuss the security sytem in Stardog in the security section.
When logged into a StardogDBMS you can access all security related features detailed in the security section using any of the core security interfaces for managing users, roles and permissions
Shiro is used internally as the core of the security framework, but unlike previous versions, you do not need to configure Shiro directly. All management can be done via the command-line or via the security API provided by StardogDBMS
Using SNARL
In the examples (1) and (4) above, you can see how to use SNARL in Java to interact with Stardog. The SNARL API will give the best performance overall and is the native Stardog API. It uses some Sesame domain classes but is otherwise a clean-sheet API and set of implementations.
The SNARL API is fluent with the aim of making code written for Stardog easier to write and easier to maintain. Most objects are easily re-used to make basic tasks with SNARL as simple as possible. We are always interested in feedback on the API, so if you have suggestions or comments, please send them to the mailing list.
Let's take a closer look at some of the interesting parts of SNARL.
Adding Data
You must always surround changes to a database with a transaction begin and commit. Changes are kept locally until the transaction is committed, or you try and perform a query operation to inspect the state of the database within the transaction.
By default, RDF added will go into the default context unless specified otherwise. As shown, you can use
Adder directly to add statements and graphs to the database,
and if you want to add data from a file or input stream, you use an io(),
format(), and stream() chain of method invocations.
See the SNARL API Javadocs for all the gory details.
Getter Interface
SNARL also supports some sugar for the classic statement-level
(getSPO() scars, anyone?) interactions. We ask in the first
line of the snippet above for an iterator over the Stardog connection,
based on aURI in the subject position. Then a while-loop, as
one might expect...
You can also parameterize Getters by binding different
positions of the Getter (which acts like a kind of RDF
statement filter)—and the iterating as usual.
Note the aIter.close() which is important for
Stardog databases to avoid memory leaks.
If you need to materialize the iterator as a graph, you can do that by
calling graph().
The snippet doesn't show object() or context()
parameters on a Getter, but those work, too, in the obvious
way.
Parameterized SPARQL Queries
SNARL also lets us parameterize SPARQL queries.
We can make a Query object by passing a SPARQL query in the constructor. Simple. Obvious.
Next, let's set a limit for the results: aQuery.limit(10);
or if we want no limit, aQuery.limit(Query.NO_LIMIT). By default, there is no limit imposed on the query object; we'll use whatever is specified in the query.
But you can use limit to override any limit specified in the query, however specifying NO_LIMIT will not remove a limit specified in a query, it will only remove
any limit override you've specified restoring the state to the default of using whatever is in the query.
We can execute that query with executeSelect() and
iterate over the results. We can also rebind the "?s" variable easily:
aQuery.parameter("s", aURI), which will work for all instances
of "?s" in any BGP in the query, and you can specify null to remove the binding.
Query objects are re-useable, so you can create one from your original query string and alter bindings, limit, and offset in any way you see fit and re-execute the query to get the updated results.
We strongly recommend the use of SNARL's parameterized queries over concatenating strings together in order to build your SPARQL query. This latter approach opens up the possibility for SPARQL injection attacks unless you are very careful in scrubbing your input.
Removing Data
Let's look at removing data via SNARL;
in the example above, you can
see that file or stream-based removal is symmetric to file or stream-based
addition, i.e., calling remove() in an io() chain
with a file or stream call. See the SNARL API docs for more details about
finer-grained deletes, etc.
Reasoning
Stardog supports query time OWL 2 QL, EL, and RL
reasoning by using a query rewriting technique.Connection layer and then any queries
executed over that connection are executed with reasoning enabled;
you don't need to do anything up front when you create your database if you want to use reasoning.
In this code example, you can see that it's trivial to enable reasoning
for a Connection: simply call reasoning() with
the appropriate constant (such as ReasoningType.QL) passed in. In
addition to OWL2 QL, EL, and RL, Stardog supports OWL2 DL schema queries. Stardog also supports SWRL.
For more information on how reasoning is supported in Stardog, check out the reasoning section.
Search
We introduced a search system into Stardog 0.6.5; it can be used from the command line or remotely over the network interface. It can also be used from Java in the following way.
The fluent Java API for searching in SNARL looks a lot like the other
search interfaces: We create a Searcher instance with a
fluent constructor: limit() sets a limit on the results;
query() contains the search query, and threshold
sets a minimum threshold for the results.
Then we call the search() method of our
Searcher instance and iterate over the results
(i.e., SearchResults). Last, we can use
offset() on an existing Searcher to grab another
page of results.
Stardog also supports performing searches over the full-text index within a SPARQL query via the LARQ SPARQL syntax.
This provides a powerful mechanism for querying both your RDF index and full-text index at the same time while also giving you a more performant option to the SPARQL regex filter.
SNARL Connection Views
As of 0.7, SNARL Connections
support obtaining a specified type of Connection. This provides the ability to extend
and enhance the features available to a Connection while maintaining the standard,
simple Connection API. The Connection as method takes
as a parameter the interface, which must be a sub-type of a Connection, that you
would like to use. as will either return the Connection as the view
you've specified, or it will throw an exception if the view could not be obtained
for some reason.
An example of obtaining an instance of a
SearchConnection
to use Stardog's full-text search support.
SNARL API Docs
Please see SNARL API docs for more information.
Using Sesame
Stardog supports the Sesame API; thus, for the most part, using Stardog and Sesame is not much different from using Sesame with other RDF databases. There are, however, at least two differences worth pointing out.
Wrap the connection with StardogRepository
As you can see from the code snippet, once you've created a ConnectionConfiguration
with all the details for connecting to a Stardog database, you can wrap that in a StardogRepository
which is a Stardog specific implementation of the Sesame Repository interface. At this
point, you can use the resulting Repository like any other Sesame Repository implementation. Each time
you call Repository.getConnection, your original ConnectionConfiguration will be used
to spawn a new connection to the database.
Autocommit
Stardog's RepositoryConnection implementation will, by default, disable autoCommit status.
When enabled, every single statement added or deleted via the Connection will incur the cost
of a transaction, which is too heavyweight for nearly every use case. You can enable autoCommit
and it will work as expected, but for the aforementioned reason, and the fact that Sesame has deprecated
the method, we recommend leaving it disabled.
Using Jena
Stardog supports Jena via a Sesame-Jena bridge, so it's got more overhead than Sesame or SNARL. YMMV. There two points in the Jena example to re-iterate.
Init in Jena
The initialization in Jena is a bit different from either
SNARL or Sesame; you can get a Jena Model instance
by passing the Connection instance (returned by
ConnectionConfiguration) to the Stardog factory,
SDJenaFactory.
Add in Jena
Jena also wants to add data to a Model one statement
at a time, which can be less than ideal. To work around this restriction,
we recommend adding data to a Model in a single Stardog
transaction, which is initiated with aModel.begin().
Then to read data into the model, we recommend using RDF/XML, since
that triggers the BulkUpdateHandler in Jena or grab a
BulkUpdateHandler directly from the underlying Jena graph.
The other options include using the Stardog command-line client to bulk load a Stardog database or to use SNARL for loading and then switch to Jena for other operations, processing, query, etc.
Client-Server Stardog
As you can see, using Stardog from Java in either embedded
or client-server mode is very similar—the only
really visible difference is the use of url() in a
ConnectionConfiguration: when it's present, we're in
client-server model; else, we're in embedded mode.
That's a good and a bad thing: it's good because the code is symmetric and uniform. It's bad because it can make reasoning about performance difficult, i.e., it's not entirely clear in client-server mode which operations trigger (or don't trigger) a round trip with the server and, thus, which may be more expensive than in embedded mode.
In client-server mode, everything triggers a round trip with these exceptions:
- closing a connection outside a transaction
- any parameterizations (or other) of a query or getter instance
- any database state mutations in a transaction that don't need to be immediately visible to the transaction; that is, changes are sent to the server only when they are required, on commit, or on any query or read operation that needs to have the accurate up-to-date state of the data within the transaction.
Stardog generally tries to be as lazy as possible; but in client-server mode, since state is maintained on the client, there are fewer chances to be lazy and more interactions with the server.
Embedded Stardog
In addition to the url() issue, the other key difference
between client-server and embedded Stardog is, of course, Java classpath
woes. As of 1.2.2, there is one classpath issues to watch
out for:
- if you're using Jena in embedded mode, then Jena's libraries should be on the classpath after Stardog's, because of conflicting Lucene JARs
Please let us know if you find any other conflicts among JARs or other classpath issues.
Connection Pooling
Stardog supports connection pools for SNARL Connection
objects for efficiency and programmer sanity. Here's how they work:
Per standard practice, we first initialize security and grab a connection, in this case to the testConnectionPool database.
Then we setup a ConnectionPoolConfig, using its fluent
API, which establishes the parameters of the pool:
- using()
- Sets which ConnectionConfiguration we want to pool; this is what is used to actually create the connections.
- minPool()
- maxPool()
- Establishes min and max pooled objects; max pooled objects includes both leased and idled objects.
- expiration()
- Sets the idle life of objects; in this case, the pool reclaims objects idled for 1 hour.
- blockAtCapacity()
- Sets the max time (here: in minutes) that we'll block waiting for an object when there aren't any idle ones in the pool.
Whew! Next we can ConnectionPoolConfig thing.
Finally, we call obtain() on the
ConnectionPool when we need a new one. And when we're done
with it, we return it to the pool so it can be re-used, by calling
release(). When we're done, we shutdown() the
pool.
Since reasoning in Stardog is enabled per
Connection, you can create two pools: one with reasoning
connections, one with non-reasoning connections; and then use the one you
need to have reasoning per query; never pay for
more than you need.
Deprecation
Methods and classes in SNARL API that are marked with the
com.google.common.annotations.Beta are subject to change or removal in any release. We are using this annotation
to denote new or experimental features, the behavior or signature of which
may change significantly before it's out of "beta".
We will otherwise attempt to keep the public APIs as stable as
possible, and methods will be marked with the standard @Deprecated
annotation for a least one full revision cycle before their removal from the
SNARL API.
Anything marked @VisibleForTesting is just that, visible as a consequence of test case requirements; don't write any important code that depends on functions with this annotation.
