Functions delay binding; data structures induce binding. Moral: Structure data late in the programming process.
Alan Perlis, Epigrams in Programming
Stardog Docs
Stardog is an RDF database. For more information, see the slides
of a recent talk (HTML5) about Stardog. This is
documentation for Stardog 0.7.3 (05 December 2011). Check out the
release notes.
Acquiring Stardog
During the run-up to the 1.0 release, Stardog
is supported via the Stardog support forum, stardog@clarkparsia.com. To
become a Stardog tester, fill out this
form.
Acknowledgments
We thank all the Stardog testers, especially Robert Butler, Al
Baker, Marko A. Rodriguez, Brian Sletten, Alin Dreghiciu, Rob Vesse,
Stephane Fallah, John "New Model Army" Goodwin, José Devezas, Chris
Halaschek-Wiener, Gavin Carothers, Brian Panulla, Ryan Kohl, Morton
Swimmer, Quentin Reul, Paul Dlug, James Leigh, Alex Tucker, Ron Zettlemoyer, Jim Rhyne.
The Essential Documentation
Everything you need to administer, use, and program Stardog.
A glossary of technical terms used throughout the Stardog docs.
- Stardog Editions: Community, Developer, Enterprise
- Installation
- Configuration
- Support and Maintenance
- Reporting a Bug
- Administering the Server
- Using the Command-line Client
- Administering a Database
- Configuring a Security Realm
- Optimizing Bulk Data Loading
- Using the Web Console
- Querying RDF with Stardog
- Using SPARQL
- Using Gremlin
- Debugging Slow Queries
- Background and Introduction
- Creating a Connection String
- Using SNARL
- Using Sesame
- Using Jena
- Embedding Stardog
- Connection Pooling
- Deprecation, Backward Compatibility, etc
- SPARQL Protocol
- Extended HTTP Protocol
- Avro RPC
- Introduction
- Building Spring for Stardog
- Overview
- Use
- Examples
- Introduction
- Searching with the Command Line
- Searching over the Network
- Searching Programatically
- Administering Search Indexes
- Background and Terminology
- Validating RDF with Integrity Constraints
- Query Answering and Reasoning
- Usage and Guidelines
- Background and Terminology
- Known Issues
Programming with JVM-based Languages
- Background
- Python, Ruby, Scala, Clojure, etc.
As of 0.7.3, the known
issues include:
- Multi-reader, single-writer concurrency. Will fix.
- Transactions leave dirty pages on disk that are not used by the database
and not needed for anything, but are never deleted, which causes database
indexes after significant transaction volume to take up a lot of disk space,
while only using a fraction of it. Will fix.
- On database server shutdown, Stardog may corrupt the database if there
is a commit in-progress. Will fix.
- There's no way to set a database's status to "offline" except by
actually deleting it. Will fix.
- When creating a database through Stardog CLI with one or more input
files, the operation will succeed even if some of those files fail to be
loaded. Stardog will load all the files it can and continue with creating
the index, even if there are no triples loaded. An error message will be
printed on the console for each file that failed to load.
- If relative URIs exist in the data files passed to create, add, or
remove commands, then they will be resolved using the constant base URI
http://stardog.clarkparsia.com/ iff the format of the file allows
base URIs. Turtle and RDF/XML formats allows base URIs but N-Triples format
doesn't allow base URIs and relative URIs in N-Triples data will cause
errors.
- Gremlin includes gossip-1.2.jar which includes a different copy
of slf4j than Stardog uses. So depending on how your classpath is setup,
you can end up with some odd runtime errors from the discrepancy between
these versions of slf4j. Please be aware of this and create your classpath
accordingly if you are using the Gremlin jars.
- In case reasoning is enabled, Stardog will throw a NPE when querying for
the range/domain of a property that does not exist in the database.
- Using HTTP URL's in connection strings against an Avro SASL server will result
in NPE's being thrown by Avro. The server continues to operate, but the client
will hang waiting for timeout.
For a complete change log, see Stardog
Change Log.
The release notes for 0.7.3:
- ADDED: New SNARL Connection view, ReasoningConnection which exposes the functions provided by the Stardog reasoner.
- ADDED: Support in ReasoningConnection for obtaining the explanation for an inference.
- ADDED: Explanations of IC violations as the expected and/or missing statements that make up the violation.
- ADDED: New example for using explanation facilities, updating ICV example.
- MODIFIED: Improved error messages when CLI fails to connect to a database.
- FIXED: Bug in the handling of reflexive properties when they involve TOP.
- FIXED: Bug in initialization of TOP operator with bound objects, resolves NPE in execution.
- FIXED: Bug in bnode handling in TBox extraction.
The release notes for 0.7.2:
- ADDED: RDFS & DL reasoning types. DL reasoning type supports schema-only queries.
- ADDED: Execution optimizations for when TOP appears in a query due to reasoning.
- MODIFIED: Sesame Repository/Sail implementation to use a SNARL Connection per Sail/RepositoryConnection rather than per Repository.
- MODIFIED: Better formatting of query results in CLI query command including new formatting options.
- FIXED: Retrieving ranges of data properties in reasoner.
- FIXED: Read transaction isolation bug for mulitple connections against the same database.
- FIXED: Bug related to incorrect re-association of Shiro auth credentials with pooled connections.
The release notes for 0.7.1:
- ADDED: Canonicalization of literal values on input, either bulk loading or normal add operations. Canonlicalization can be disabled when creating the database if it is not desired, but is on by default.
- MODIFIED: CLI connection strings no longer assume you are connecting to an embedded store when no protocol is supplied. You must now specify a protocol.
- FIXED: ExpressionFactory is no longer stripped from the core jar file
- FIXED: Multi-line queries work with CLI now
- FIXED: Bug in loading IC's from file, occasionally constraints could be malformed
The release notes for 0.7:
- ADDED: Performance of adds and deletes to and from existing databases greatly improved via concurrent application of changes to the index and the introduction of (optional, disabled by default) differential indexes
- ADDED: Stardog reasoner now supports OWL2 EL and RL reasoning types. These can be specified at connection time; EL, RL, and QL reasoning are performed at query time, i.e., no reasoning is computed at data load time.
- ADDED: Support for ABox, TBox, 'hybrid' (i.e., mixed ABox and TBox) queries for all reasoning types.
- ADDED: Support for Integrity Constraint Validation.
- MODIFIED: Propagation of reasoner state amongst open connection to the reasoner; state is better shared across connections, thus requiring less overhead to maintain current state.
- MODIFIED: 'delete' was renamed to 'drop' in the command line.
- FIXED: Bug in two-phase commits for changes to a reasoner
NOTE: The version of Sesame changes with this release from 2.3.3 to 2.3.4, which is not
an official release. There was a bug in the serialization of Sesame result sets for select
queries using their binary format which we discovered while working on Stardog 0.7. The
fix for SES-852 was included in their 2.6 branch, so we had to backport the change to
Sesame 2.3.3; we called that version Sesame 2.3.4.
The release notes for 0.6.10:
- FIXED: Fixed an issue with not always consuming the HTTP response which could lead to not releasing HTTP connections to the pool and it its eventual exhaustion.
- MODIFIED: Improved algebraic rewriting of query plans which avoid multiple evaluation of the same part of the query, e.g., the operator in both parts of a UNION.
- MODIFIED: A better handling of the UNION operator by sorting its output which facilitates joins with other operators in the query plan.
- MODIFIED: A better handling of deep chains of nested joins by maintaining a certain order of intermediate results produced by hash joins and avoiding page loads.
- MODIFIED: An improved, thread-safe implementation of the query plan cache, which is now concurrently accessible from all connections. This allows the server to re-use a previously optimized query plan for a structurally equivalent query executed by another client.
- ADDED: An init startup script for a Stardog server so that it can automatically start with other system services at the boot time.
The release notes for 0.6.9:
- FIXED: Fixed an issue pertaining to QL re-writes with optionals that use a filter which was causing incorrect results for some QL queries (ticket #44).
- FIXED: Fixed query evaluation where the presence of a particular combintation of union, optional, and distinct can lead to non-distinct result sets (ticket #48).
- FIXED: Fixed the passing of the disableSecurity flag into the HTTP server
- FIXED: Fixed a bug in the rendering of SPARQL queries using isBlank with remote endpoints
- FIXED: Fixed a bug in query evaluation arising from a certain combination of order by, limit, and optional where the sort key is on a variable bound by an optional pattern.
- FIXED: User specified query bindings specified via Query.parameter are correctly serialized and sent to the remote server.
- MODIFIED: Query rewriting now fails when a query has BGP's that are not supported by the reasoner; this is strict mode. Strict mode can be disabled and the reasoner will re-write only the BGP's it supports. This introduces a JVM flag 'strictReasoning' for controlling this behavior.
- MODIFIED: HTTP connection will now switch over to HTTP POST if the SPARQL query is very long to avoid any potential URL overflow issues.
- ADDED: CLI now includes an 'export' function to dump the contents of part or all of a Stardog database to RDF.
The release notes for 0.6.8:
- ADDED: Updated POM to include Avro.
- ADDED: Avro schema included in distribution.
- FIXED: Fixed a bug in the caching of the current list of databases, could still
access deleted databases in some circumstances. (ticket #42)
- FIXED: Fixed an bug in the handling of unions in the QL reasoner. Nested
unions were occasionally getting duplicated leading to incorrect results
without the distinct modifier. (ticket #43, #46)
- MODIFIED: Better handling of constant values in construct queries decreasing the time it
takes to create and return the Statements constructed by the query.
- MODIFIED: Increased the speed of some index scans by optimizing the number of page reads
- MODIFIED: Better execution plan re-use & caching, similar to Prepared Statements, in structurally equivalent plans.
- MODIFIED: Significantly improved join-join cardinality estimations.
- MODIFIED: Query optimizer does a better job with join selection and can often pick a better join for the execution.
- MODIFIED: Avro & HTTP connections are eagerly validated by the driver so invalid connections will fail on connect rather than on first action.
- MODIFIED: Transaction ID generation changed to reduce chance of ID collision
- MODIFIED: Internal connection management, both with raw database connections and HTTP based connections to address deadlock issues in highly concurrent environments
The release notes for 0.6.7:
- FIXED: HTTP error in construct queries via the Jena bindings
- ADDED: New CLI command 'server' which can start any of the installed Stardog servers
- ADDED: New Avro-based RPC protocol for Stardog Connections
- ADDED: Maven POM file included in distribution
The release notes for 0.6.6:
- FIXED: Incorrect query evaluation results in some corner cases involving OPTIONAL
- MODIFIED: Improved query evaluation performance with a new query planner
- MODIFIED: Command-line interface to accept global args: --home, --change-buffer-size, --disable-security
- MODIFIED: command-line add, remove, create commands to include --strict-parsing option
The release notes for 0.6.5:
- FIXED: Bug in query evaluation where we were missing results when the first
entry in an iterator in a disk index occurs on a page boundry.
- ADDED: Support for full-text search via our semantic search engine Waldo.
- ADDED: We included the online documentation in the release in docs/manual.
This is best viewed if you fire up a web server rather than the raw files, otherwise the links will not be correct.
The release notes for 0.6.2:
- FIXED: Typo in HTTP protocol, Trig & Trix mimetypes were switched.
- FIXED: Bug in left outer joins when the right hand operator was initially
empty causing the first solution set to be missed. (ticket #39)
- FIXED: Resolved IllegalArgumentException which was incorrectly thrown
from the Jena integration when a query w/ optionals was executed against
a remote Stardog database returning null values in the response.
- FIXED: Bug in detection of special BNode syntax in queries for ensuring
BNode stability in queries.
The release notes for 0.6.1:
- FIXED: Debugging exception thrown when using log4j bindings for SLF4j with
the logging level set to debug. (ticket #36)
- FIXED: Issue with calcuation of optionals in joins when the optional binding
is at the beginning or end of an iteration. (ticket #34)
- FIXED: Resolved synchronization issue for connections to an index that
are fetched from the internal connection pool. Resolves getting already
closed connections from the pool. (ticket #37)
- FIXED: Datatyped literals used in filters were getting munged during
QL re-writes resulting in an exception. (ticket #35)
- MODIFIED: Iteration over disk indices modified to more efficiently
calcuate which pages are actually required to answer the query and avoid
loading intermediate pages.
The release notes for 0.6:
- FIXED: Bug in loading datasets which use literals larger than 8k.
- FIXED: Bug in the writer locks that could allow multiple writers
access to an index in a multithreaded environment, which usually lead to
corrupted indexes (ticket #29).
- FIXED: Issue with the CLI's database create command, multiple
calls in rapid succession to create databases caused some of the new
databases to be lost (ticket #32).
- FIXED: QL query rewriter was mangling the variables used within
the query in certain situations causing query results to be incorrect
(ticket #30).
- FIXED: QL query rewriter was not correctly respecting variable
substitutions with datatyped literals and was subsequently losing the
variable binding specified by the user (ticket #31).
- FIXED: FROM/FROM Named clauses were incorrectly set to the HTTP server
- MODIFIED: Improved index page load speed.
- MODIFIED: Reduced memory requirements in the indexes wrt to cache
overhead
- MODIFIED: Persistence of memory indexes to disk is now done
in parallel as much as possible to yield a significant increase in save
(and load) performance. This required changing the on-disk format of
memory indexes. Old versions of memory indexes are unfortunately backwards
incompatible with these updates. We apologize for the inconvenience.
- MODIFIED: On format of disk-based indexes. This helps us resolve some
issues we had with the previous format, and increases load speed, but the
index format is backwards incompatible with the previous indexes. However,
we provide a migration utility in the command line program for changing
old disk indexes to the new format. See the CLI for more information on
migrating old indexes (or ping the mailing list).
- MODIFIED: Decreased memory usage in the query engine
- MODIFIED: Added a change buffer in the layer between the SNARL API and
the underlying index it is connected to. This allows most changes within a
transaction to be pushed into the index at once increasing the performance
of commits as the changes can mostly be applied in bulk.