In man-machine symbiosis, it is man who must adjust: The machines can't.Alan Perlis, Epigrams in Programming
In this chapter we discuss how to administer a Stardog Server and Stardog databases, chiefly by way of discussing how to use the various Stardog command-line programs. An important part of Stardog administration is security, which is discussed in a separate, dedicated chapter.
Stardog's command-line interface (CLI) comes in two parts:
The admin and user's tools operate on local or remote databases, using either HTTP or SNARL Protocols. Both of these CLI tools are Unix-only, are self-documenting, and the help output of these tools is their canonical documentation. The documentation of these tools in this chapter goes into more detail, offers background, etc. But if there is a conflict between this documentation and the output of the CLI tools' "help" command, the CLI tools' output is correct.
To use the Stardog CLI tools, you can start by asking them to display help:
$ stardog help
Or:
$ stardog-admin help
The rationale for separating functionality into two CLI programs is largely based on security, since stardog-admin will need to have considerably tighter access restrictions than stardog.
SECURITY NOTICE: For usability before Stardog 1.0 release, we automatically supply user "admin" and password "admin" in stardog-admin commands if no user or password are given. This is obviously not secure; before any serious use of Stardog is contemplated, read the security chapter at least twice, and then—minimally—change the administrative password to something we haven't published on the interwebs!
Some CLI subcommands require a Stardog connection string as an argument to identify the server and database upon which operations are to be performed. Connection strings are URLs and may either be local to the machine where the CLI is run or they may be on some other remote machine. There are two URL schemes recognized by Stardog: http:// and snarl://. The former uses Stardog's (extended) version of SPARQL Protocol; the latter uses Stardog's native data access protocol, called SNARL, which is based on Google's Protocol Buffers.
Note: stardog-admin uses SNARL Protocol only; it will not work with or recognize HTTP connection strings. stardog user's client works with HTTP or SNARL Protocol, interchangeably.
To make a connection string, you need to know the URL scheme; the machine name and port the Stardog Server is running on; any (optional) URL path to the database (it's very unlikely you'll need this); and the name of the database:
{scheme}{machineName}:{port}/{databaseName};{connectionOptions}
snarl://server/billion-triples-punk
http://localhost:5000/myDatabase
http://169.175.100.5:1111/myOtherDatabase;reasoning=QL
snarl://stardog:8888/the_database
snarl://localhost:1024/db1;reasoning=NONE
Using the default ports for SNARL and HTTP protocols simplifies connection strings. connectionOptions are a series of ; delimited key-value pairs
which themselves are = delimited. key names must be lowercase and their values are case-sensitive.
Stardog Server is multiprotocoled: it supports both SNARL and HTTP protocols. The default port for SNARL is 5820; the default port for HTTP is 5822. All administrative functions work over SNARL protocol only—creating or dropping databases; setting databases to online or offline status; creating modifying users or roles or permissions work over SNARL, not HTTP.
To use any of these commands against a remote server, pass a --server argument with a SNARL URL.
Note: If you are running stardog-admin on the same machine where Stardog Server is running, and you're using the default protocol ports, then you can omit the --server argument and simply pass a database name via -n option. Most of the following commands assume this for simplicity.
Note: Unlike the other stardog-admin subcommands, starting and stopping the server may only be run locally, i.e., on the same machine as Stardog Server will run on.
The simplest way to start the server—running on the default ports, detaching to run as a daemon, and writing stardog.pid and stardog.log to the current working directory— is
$ stardog-admin server start
To specify parameters:
$ stardog-admin --pidfile=stardog.pid --logfile=stardog.log server start --http=80
To shut down the server:
$ stardog-admin server stop
To stop a server and pass a specific PID file:
$ stardog-admin --pidfile=stardog.pid server stop
Note: ports can be specified using the properties --snarl and --http. The HTTP interface can be disabled by
using the flag --no-http; the SNARL interface may not be disabled.
Stardog Server will lock STARDOG_HOME when it starts to prevent synchronization errors and other nasties if you start more than one Stardog Server with the same STARDOG_HOME. If you need to run more than one Stardog Server instance, choose a different STARDOG_HOME or pass a different value to --home.
Stardog Server's behavior can be configured via the JVM stardog.home, which sets Stardog Home, overriding the value of STARDOG_HOME set as an environment variable.
Stardog Server's behavior can also be configured via a stardog.properties—which is a Java Properties file—file in STARDOG_HOME. To change the behavior of a running Stardog Server, it is necessary to shut it down and restart it.
The following properties are available in stardog.properties:
To administer a Stardog database, some config options must be set at creation time; others may be changed subsequently, while yet may never be changed. All of the config options have sensible defaults (except, obviously, for database name), so you don't have to twiddle any of the knobs till you really need to.
As of Stardog 0.9.5, a database must be set to offline status before most configuration parameters may be changed. So the routine is to set the database offline, change the parameters programmatically, and the set the database to online. All of these operations may be done programmatically from CLI tools, such that they can be scripted in advance to minimize downtime.
The following table summarizes the options:
| Config Option | Mutability | Default | API |
|---|---|---|---|
| Config Option | Mutability | Default | API |
| database.name | false | {NO DEFAULT} | DatabaseOptions.NAME |
| database.online | false |
true | DatabaseOptions.ONLINE |
| icv.active.graphs | false | default | DatabaseOptions.ICV_ACTIVE_GRAPHS |
| icv.enabled | true | false | DatabaseOptions.ICV_ENABLED |
| icv.reasoning.type | true | NONE | DatabaseOptions.ICV_REASONING_TYPE |
| index.differential.enable.limit | true | 1000000 | IndexOptions.DIFF_INDEX_MIN_LIMIT |
| index.differential.merge.limit | true | 10000 | IndexOptions.DIFF_INDEX_MAX_LIMIT |
| index.literals.canonical | false | true | IndexOptions.CANONICAL_LITERALS |
| index.named.graphs | false | true | IndexOptions.INDEX_NAMED_GRAPHS |
| index.persist | true | false | IndexOptions.PERSIST |
| index.persist.sync | true | true | IndexOptions.SYNC |
| index.statistics.update.automatic | true | true | IndexOptions.AUTO_STATS_UPDATE |
| index.type | false | Disk | IndexOptions.INDEX_TYPE |
| reasoning.consistency.automatic | true | false | DatabaseOptions.CONSISTENCY_AUTOMATIC |
| reasoning.punning.enabled | false | false | DatabaseOptions.PUNNING_ENABLED |
| reasoning.schema.graphs | true | default | DatabaseOptions.SCHEMA_GRAPHS |
| search.enabled | false | false | DatabaseOptions.SEARCHABLE |
| search.reindex.mode | false | wait | DatabaseOptions.SEARCH_REINDEX_MODE |
| transactions.durable | true | false | DatabaseOptions.TRANSACTIONS_DURABLE |
The following options take a boolean value: database.online, icv.enabled, index.literals.canonical, index.named.graphs, index.persist, index.sync, index.statistics.update.automatic, reasoning.consistency.automatic, reasoning.punning.enabled, search.enabled, transactions.durable.
The legal value of database.name is given by the regular expression [A-Za-z]{1}[A-Za-z0-9_-].
The legal value of icv.active.graphs is a list of named graph identifiers.
The legal value of icv.reasoning.type is one of the reasoning levels (i.e, one of the following strings): NONE, RDFS, QL, RL, EL, DL.
The legal value of index.differential.* is an integer.
The legal value of index.type is the string "disk" or "memory" (case-insensitive).
The legal value of reasoning.schema.graphs is a list of named graph identifiers, including (optionally) the special names, tag:stardog:api:context:default and tag:stardog:api:context:all, which represent the default graph and the union of all named graphs and the default graph, respectively. In the context of database configurations only, Stardog will recognize default and * as shorter forms of those URIs, respectively.
The legal value of search.reindex.mode is one of the strings sync or async (case insensitive) or a legal Quartz cron expression.
Databases are either online or offline; this allows database maintenance to be decoupled from server maintenance.
Databases may be put into online or offline status via one of two strategies.
For moving from offline to online:
For moving from online to offline:
By default, the online strategy for a database is wait; you can set a database from offline to online via SOMETHING.
To set a database from offline to online:
$ stardog-admin offline -n my_db_name
To set the database offline without waiting:
$ stardog-admin offline --nowait -n my_db_name
To set the database online:
$ stardog-admin online -n my_db_name
If Stardog Server is shutdown while a database is offline, the database will be offline the next time the server starts.
Stardog databases may be created locally or remotely; but, of course, performance is better if the data files don't have to be transferred over a network.
Minimally, the only thing you must know to create a Stardog database is a database name; alternately, you may customize some other database parameters and options depending on anticipated workloads, data modeling, and other factors.
As a boon to the overworked admin or devops peeps, Stardog Server supports a feature we call "database creation templates", which is to say that you can pass a Java Properties file with config values set and with the values (typically just the database name) that are unique to a specific database passed in CLI parameters.
To create a new database with the default options by simply providing a name and a set of initial datasets to load:
$ stardog-admin create -n my-database input.ttl another_file.rdf moredata.rdf.gz
Datasets can be loaded later as well, though bulk loading at creation time is the fastest way to load data.
To create (in this case, an empty) database from a template file:
$ stardog-admin create -c database.properties
At a minimum, the configuration file must have a value for database.name option.
Finally, if you only want to change only a few configuration options you can directly provide the values for these options in the CLI args as follows:
$ stardog-admin create -n db1 -o icv.enabled=true icv.reasoning.type=QL -- input.ttl
Note that '--' is used in this case when -o is the last option to delimit the value for -o from the files to be bulk loaded.
Please refer to the CLI help for more details of the create subcommand.
| Name | Description | Arg values | Default |
|---|---|---|---|
| Name | Description | Arg values | Default |
| --durable, -d | If present, sets all mutation operations to database as transactionally durable; durability increases the cost of all mutation operations. |
False | |
| --guard, -g arg | Specifies that ICV guard mode should enabled for this database. Transactional writes to database that are invalid with respect to constraints will fail. | OFF disables guard mode; a reasoning type |
Disabled |
| --type, -t | Specifies the kind of database to be created: Memory or Disk. | M,D | Disk |
| --searchable, -e | Specifies this database should be searchable. | None | |
| --index-triples-only, -i | Specifies this database's indexes should be optimized for RDF triples (as opposed to quads) only | [it's a flag and takes no args] |
By default, Stardog builds extra indexes for named graphs. These additional indexes are used when SPARQL queries specify datasets using FROM and FROM NAMED. With these additional indexes, better statistics about named graphs are also computed.
Stardog may also be configured to create and to use fewer indexes, if the database is only going to be used to store RDF triples, that is, will not be used to store named graph information. In this mode, Stardog will maintain fewer indexes, which will result in faster database creation and faster updates without compromising query answering performance. In such databases, quads (that is: triples with named graphs or contexts specified) may still be added to these database at any time, but query performance may degrade in such cases.
To create a database which indexes only RDF triples, set the option index.named.graphs to false at database creation time. The CLI provides a shorthand option—-i or --index-triples-only—which is equivalent.
Please note that this option can only be set at database creation time and cannot be changed later without rebuilding the database, so use this option with caution.
While Stardog is generally biased in favor of read (i.e., query) performance, write performance is also important in many applications. In order to increase write performance, Stardog may be used, optionally, with a differential index.
Stardog's differential index is used to persist additions and removals separately from the main indexes, such that updates to the database can be performed faster. Query answering takes into consideration all the data stored in the main indexes and the differential index; hence, query answers are computed as if all the data is stored in the main indexes.
There is a slight overhead for query answering with differential indexes if the differential index size gets too large. For this reason, the differential index is merged into the main indexes when its size reaches the DIFF_INDEX_MAX_LIMIT. There is no benefit of differential indexes if the main index itself is small. For this reason, the differential index is not used until the main index size reaches DIFF_INDEX_MAX_LIMIT.
In most cases, the default value of the DIFF_INDEX_MAX_LIMIT parameter will work fine and doesn't need to be changed.
Stardog supports loading data from compressed files directly so there is no need to uncompress files before loading. Compressed input is actually typically faster to load since it minimizes disk access so the recommended way to load large input files is to load them in compressed format.
Stardog supports GZIP and ZIP compressions natively. If a file name passed to create or add commands (through CLI or API) will be interpreted to be a gzip file if the file name ends with '.gz'. The RDF format of the file is determined by the extension that comes before '.gz'. So, if a file named 'test.ttl.gz' is used as input, Stardog will perform gzip decompression during loading and parse the file with Turtle parser. All the formats supported by Stardog (RDF/XML, Turtle, Trig, etc.) can be used with gzip compression.
The zip support works differently since zipped files can contain multiple files inside. When an input file name ends with '.zip', Stardog performs zip decompression and tries to load all the files inside the zip file. The RDF format of the files inside the zip file is determined by their file names as usual. If there is an unrecognized file extension (e.g. '.txt') that file will be skipped.
This command removes a database and all associated files and metadata. As with create, this command may work against a database directly or via a running Stardog server. drop operates without regard to pending writes or queries. Only use drop when you're certain!
It takes as its only argument a valid database name. For example,
$ stardog-admin drop -n my_db_name
Stardog supports integrity constraint validation as a data quality mechanism via closed world reasoning. Constraints can be specified in OWL, SWRL, and SPARQL.
Please see the ICV chapter for more about using ICV in Stardog.
The icv subcommand can be used to add, delete, or drop all constraints from an existing database. It may also be used to validate an existing database with constraints that are passed into the icv subcommand; that is, using different constraints than the ones already associated with the database.
For ICV in transacted mutations of Stardog databases, see the database creation section above.
The migrate subcommand migrates an older Stardog database to the latest 0.9.5 of Stardog. Its only argument is the name of the database to migrate. migrate won't necessarily work between arbitrary Stardog version, so before upgrading check the release notes for a new version carefully to see whether migration is required or possible.
$ stardog-admin migrate -n myDatabase
will update myDatabase to the latest database format.
You can get some information about a database (online/offline status, creation time, last modification time, etc.) by running the following command:
$ stardog-admin get -n my_db_name
This will return all the metadata stored about the database, including the values of configuration options used for this database instance. If you want to get the value for a specific option then you can run the following command:
$ stardog-admin get -o index.named.graphs -n my_db_name
See the Security Chapter for information about Stardog's security system, secure deployment patterns, and more.
Stardog's search service is described in the Using Stardog chapter.
But managing the reindexing of search indexes is an administrative task.
There are three modes for reindexing indexes:
This is specified when creating a database by setting the property search.reindex.mode to "sync", "async", or to a valid cron expression. The default is "sync".
Stardog tries hard to do bulk loading at database creation time in the most efficient and scalable way possible. But data loading time can vary widely, depending on factors in the data to be loaded, including the number of unique resources, etc. We'll continue to improve bulk loading as we move to the 1.0 release; for now, here are some tuning tips that may work for you: