It is easier to write an incorrect program than understand a correct one.Alan Perlis, Epigrams in Programming
Using OWL as if it were a schema or constraint language for RDF and Linked Data has several advantages:
But the OWL 2 standard from W3C isn't designed for this usage; rather, it's primarily designed for inferencing.
In Stardog, you can use OWL as both a schema language for RDF and as a modeling or inference language. This chapter explains how.
If you are also interested in the theoretical background, please see the ICV specification, which has all the formal details.
In the sections below, we explain the operational usage of Integrity Constraint Validation (ICV) in Stardog: it may be easier for you to understand if you read this section on terminology first.
The use of high-level languages (OWL 2, SWRL, and SPARQL) to validate RDF data using closed world semantics is one of Stardog's unique capabilities.
An integrity constraint may be satisfied or violated in either of two ways: by an explicit statement in a Stardog database or by statement that's been legally inferred by Stardog. This means that when ICV is enabled for a Stardog database, it has to be enabled relative to a reasoning type. The valid choices of reasoning type are any type or kind of reasoning supported by Stardog. As of 0.9.5, those types include
Future releases of Stardog will include RDF Schema and OWL 2 DL reasoning types.
The important implication here is that Integrity Constraint Validation is performed with three inputs: a Stardog database, a set of constraints, and a reasoning type (which may be, of course, no reasoning). This is the case because domain modelers, ontology developers, or integrity constraint authors must consider the interactions between explicit and inferred statements and how these are accounted for in integrity constraints.
There is a security implication in this design choice that may not be obvious. Changing the reasoning type associated with a database and integrity constraint validation has serious security implications with respect to a Stardog database and may only be performed by a user role with sufficient privileges for that action.
This section will describe how to use Stardog ICV via the SNARL APIs. For more information on using SNARL in general, please refer to the section on programming with Stardog in Java.
There is command-line interface support for many of the operations necessary to using ICV with a Stardog database; please see the Administration chapter for that documentation.
To use ICV in Stardog, one must:
Constraints can be created using the
ConstraintFactory which provides
methods for creating integrity constraints from OWL axioms or from SPARQL select queries.
ConstraintFactory expects your constraints, if they are defined as OWL axioms, as an RDF triple (or graph). To aid
in authoring constraints in OWL, ExpressionFactory
is provided for building the RDF equivalent of the OWL axioms of your constraint.
You can also write your constraints in OWL in your favorite editor, such as Protege, and load them into the database from your OWL file.
We recommend defining your constraints as OWL axioms, but you are free to define them using SPARQL SELECT queries. If you choose to define a constraint using a SPARQL select query, please keep in mind that if your query returns results, those are interpreted as the violations of the integrity constraint.
An example of creating a simple constraint using the ExpressionFactory.
The
This example shows how to add an integrity constraint to a Stardog database.
Here we show how to add a set of constraints as defined in a local OWL ontology.
Checking whether or not the contents of a database are valid is easy. Once you have an ICVConnection
you can simply call its isValid()
method which will return whether or not the contents of the database are valid with respect to the constraints associated with
that database. Similarly, you can provide some constraints to
the isValid() method to see if the data in the database is invalid for those specific constraints; which can be a subset
of the constraints associated with the database, or they can be new constraints you are working on.
If the data is invalid for some constraints—either the explicit constraints in your database or a new set of constraints you have authored—you
can get some information about what the violation was from the SNARL IC Connection.
ICVConnection.getViolationBindings()
will return the constraints which are violated, and for each constraint, you can get the violations as the set of bindings that satisfied the constraint
query. You can turn the bindings into the individuals which are in the violation using
ICV.asIndividuals.
In addition to using the ICConnection a data oracle to tell whether or not your data is valid with respect to some constraints, you can also use Stardog's ICV support to protect your database from invalid data by using ICV as a guard within transactions.
When guard mode for ICV is enabled in Stardog, each commit is inspected to ensure that the contents of the database are valid for the set of constraints that have been associated with it. Should someone attempt to commit data which violates one or more of the constraints defined for the database, the commit will fail and the data will not be added/removed from your database.
By default, reasoning is not used when you enable guard mode, however you are free to specify any of the reasoning types supported by Stardog when enabling guard mode. If you have provided a specific reasoning type for guard mode it will be used during validation of the integrity constraints. This means you can author your constraints with the expectation of inference results satisfying a constraint.
This illustrates how to create a persistent disk database with ICV guard mode enabled at the QL reasoning type. Guard mode can also be enabled when the database is created on the command line
Rather than discuss the formal semantics of ICV here, we will look at some examples. The following examples use OWL 2's Manchester syntax; and they assume a simple data schema, which is available as an OWL ontology and as a UML diagram. The examples also assume that the default namespace is <http://example.com/company.owl#> and that xsd: is bound to the standard, <http://www.w3.org/2001/XMLSchema#>.
We provide reference code for each of the different following examples. This is also included in the examples directory in the Stardog distribution.
This kind of constraint guarantees certain subclass and superclass (i.e., subsumption) relationships exist between instances.
| Constraint |
Class: Manager SubClassOf: Employee |
|
| A |
Individual: Alice Types: Manager |
Invalid |
| B |
Individual: Alice Types: Manager, Employee |
Valid |
This constraint says that if an RDF individual is an instance of Manager, then it must also be an instance of Employee. In ontology A, the only instance of Manager, namely Alice, is not an instance of Employee; therefore, A is invalid. In B, Alice is an instance of both Manager and Employee; therefore, B is valid.
These constraints control the types of domain and range instances for properties.
| Constraint |
ObjectProperty: is_responsible_for Domain: Project_Leader Range: Project |
|
| A |
Individual: Alice Facts: is_responsible_for MyProject Individual: MyProject Types: Project |
Invalid |
| B |
Individual: Alice Types: Project_Leader Facts: is_responsible_for MyProject Individual: MyProject |
Invalid |
| C |
Individual: Alice Types: Project_Leader Facts: is_responsible_for MyProject Individual: MyProject Types: Project |
Valid |
This constraint says that if an RDF instance i is related to an RDF instance j via the property is_responsible_for, then i must be an instance of Project_Leader and j must be an instance of Project. In ontology A, there is only one pair of individuals related via is_responsible_for, namely (Alice, MyProject), and MyProject is an instance of Project; but Alice is not an instance of Project_Leader. Therefore, A is invalid. In B, Alice is an instance of Project_Leader, but MyProject is not an instance of Project; therefore, B is not valid. In C, Alice is an instance of Project_Leader, and MyProject is an instance of Project; therefore, C is valid.
| Constraint |
DataProperty: SSN Domain: Employee |
|
| A |
Individual: Bob Facts: SSN "123-45-6789" |
Invalid |
| B |
Individual: Bob Types: Employee Facts: SSN "123-45-6789" |
Valid |
This constraint says that if an RDF instance i has a data assertion via the the property SSN, then i must be an instance of Employee. In ontology A, Bob is not known to be an instance of Employee but has SSN, therefore, A is invalid. In B, Bob is defined to be an instance of Employee, therefore the ontology is valid.
| Constraint |
DataProperty: DOB Range: xsd:date |
|
| A |
Individual: Bob Facts: DOB "1970-01-01" |
Invalid |
| B |
Individual: Bob Facts: DOB "1970-01-01"^^xsd:date |
Valid |
This constraint says that if an RDF instance i is related to a literal l via the data property DOB, then l must have the XML Schema type xsd:date. In ontology A, Bob is related to the untyped literal "1970-01-01" via DOB so A is invalid. In B, the literal "1970-01-01" is properly typed so the ontology is valid.
These constraints control whether (or not) an RDF instance participates in some specified relationship.
| Constraint |
Class: Supervisor SubClassOf: supervises some Employee |
|
| A |
Individual: Alice |
Valid |
| B |
Individual: Alice Types: Supervisor |
Invalid |
| C |
Individual: Alice Types: Supervisor Facts: supervises Bob Individual: Bob |
Invalid |
| D |
Individual: Alice Types: Supervisor Facts: supervises Bob Individual: Bob Types: Employee |
Valid |
This constraint says that if an RDF instance i is of type Supervisor, then i must be related to an individual j via the property supervises, and j must be an instance of Employee. In ontology A, Supervisor has no instances; therefore, A is vacuously valid. In B, the only instance of Supervisor, namely Alice, is related to no individual; therefore, B is invalid. In C, Alice is related to Bob via supervises, but Bob is not an instance of Supervisor; therefore, C is invalid. In D, Alice is related to Bob via supervises, and Bob is an instance of Supervisor; therefore, D is valid.
| Constraint |
Class: Project SubClassOf: number some integer[> 0, < 5000] |
|
| A |
Individual: MyProject |
Valid |
| B |
Individual: MyProject Types: Project |
Invalid |
| C |
Individual: MyProject Types: Project Facts: number "23" |
Invalid |
| D |
Individual: MyProject Types: Project Facts: number "6000"^^integer |
Invalid |
| E |
Individual: MyProject Types: Project Facts: number "23"^^integer |
Valid |
This constraint says that if an RDF instance i is of type Project, then i must be related via the property number to an integer between 0 and 5000 (inclusive). In ontology A, the individual MyProject is not known to be an instance of Project so the constraint does not apply and the ontology is valid. In B, MyProject is an instance of Project but is not known to have any data assertions via number so A is invalid. In C, MyProject does have a data property assertion via number but the literal "23" is untyped (not an integer) therefore the ontology is invalid. In D, MyProject is related to an integer via number but it is out of the range so the ontology is invalid. Finally, in E, MyProject is related to the integer 23 which is in the range of [0,5000] so this ontology is valid.
These constraints control the number of various relationships or property values.
| Constraint |
Class: Employee SubClassOf: works_on max 3 Project |
|
| A |
Individual: Bob |
Valid |
| B |
Individual: Bob Types: Employee Facts: works_on MyProject Individual: MyProject Types: Project |
Valid |
| C |
Individual: Bob Types: Employee Facts: works_on MyProject, works_on MyProjectFoo, works_on MyProjectBar, works_on MyProjectBaz Individual: MyProject Types: Project Individual: MyProjectFoo Types: Project Individual: MyProjectBar Types: Project Individual: MyProjectBaz Types: Project |
Invalid |
This constraint says that if an RDF instance i is an Employee, then i must not be related via the property works_on to more than 3 named individuals of class Project. In ontology A, Bob is not known to be an instance of Employee so the constraint does not apply and the ontology is valid. In B, Bob is an instance of Employee but is known to work on only a single project, namely MyProject, so the ontology is valid. In C, Bob is related to 4 named individuals of class Project (namely, MyProject, MyProjectFoo, MyProjectBar, and MyProjectBaz) via works_on. Due to the weak UNA these individuals are considered distinct so the ontology is invalid.
| Constraint |
Class: Department SubClassOf: inverse(works_in) min 2 Employee |
|
| A |
Individual: MyDepartment |
Valid |
| B |
Individual: MyDepartment Types: Department Individual: Bob Types: Employee Facts: works_in MyDepartment |
Invalid |
| C |
Individual: MyDepartment Types: Department Individual: Bob Types: Employee Facts: works_in MyDepartment Individual: Alice Types: Employee Facts: works_in MyDepartment |
Valid |
This constraint says that if an RDF instance i is a Department, then there should exist at least 2 instances j and k of class Employee which are related to i via the property works_in (or, equivalently, i should be related to them via the inverse of works_in). In ontology A, the individual MyDepartment is not known to be an instance of Department so the constraint does not apply and the ontology is valid. In B, MyDepartment is an instance of Department but only one instance of Employee, namely Bob, is known to work in it, so the ontology is invalid. In C, MyDepartment is related to the individuals Bob and Alice, which are both instances of Employee and (due to the weak Unique Name Assumption that Stardog adopts for ICV), are distinct, so the ontology is valid.
| Constraint |
Class: Manager SubClassOf: manages exactly 1 Department |
|
| A |
Individual: Isabella |
Valid |
| B |
Individual: Isabella Types: Manager |
Invalid |
| C |
Individual: Isabella Types: Manager Facts: manages MyDepartment |
Invalid |
| D |
Individual: Isabella Types: Manager Facts: manages MyDepartment Individual: MyDepartment Types: Department |
Valid |
| E |
Individual: Isabella Types: Manager Facts: manages MyDepartment, MyDepartment1 Individual: MyDepartment Types: Department Individual: MyDepartment1 Types: Department |
Invalid |
This constraint says that if an RDF instance i is a Manager, then it must be related to exactly 1 instance of Department via the property manages. In ontology A, the individual Isabella is not known to be an instance of Manager so the constraint does not apply and the ontology is valid. In B, Isabella is an instance of Manager but is not related to any instances of Department, so the ontology is invalid. In C, Isabella is related to the individual MyDepartment via the property manages but MyDepartment is not known to be an instance of Department, so the ontology is invalid. In D, Isabella is related to exactly one instance of Department, namely MyDepartment, so the ontology is valid. Finally, in E, Isabella is related to 2 distinct (again, because of weak UNA) instances of Department, namely MyDepartment and MyDepartment1, so the ontology is invalid.
| Constraint |
DataProperty: name Characteristics: Functional |
|
| A |
Individual: MyDepartment |
Valid |
| B |
Individual: MyDepartment Facts: name "Human Resources" |
Valid |
| C |
Individual: MyDepartment Facts: name "Human Resources", name "Legal" |
Invalid |
This constraint says that no RDF instance i can have more than 1 assertion via the data property name. In ontology A, the individual MyDepartment does not have any data property assertions so A is valid. In B, MyDepartment has a single assertion via name, so the ontology is also invalid. In C, MyDepartment is related to 2 literals, namely "Human Resources" and "Legal", via name, so the ontology is invalid.
These constraints control how instances are related to one another via properties.
| Constraint |
ObjectProperty: manages SubPropertyOf: works_in |
|
| A |
Individual: Bob Facts: manages MyDepartment |
Invalid |
| B |
Individual: Bob Facts: manages MyDepartment, works_in MyDepartment |
Valid |
This constraint says that if an RDF instance i is related to j via the property manages, then i must also be related to j va the property works_in. In ontology A, Bob is related to MyDepartment via manages, but not via works_in, so the ontology is invalid. In B, Bob is related to MyDepartment via both manages and works_in, so the ontology is valid.
| Constraint |
ObjectProperty: is_supervisor_of SubPropertyChain: manages o inverse(works_in) |
|
| A |
Individual: Jose Facts: manages MyDepartment, is_supervisor_of Maria Individual: Maria Facts: works_in MyDepartment Individual: Diego Facts: works_in MyDepartment |
Invalid |
| B |
Individual: Jose Facts: manages MyDepartment, is_supervisor_of Maria, is_supervisor_of Diego Individual: Maria Facts: works_in MyDepartment Individual: Diego Facts: works_in MyDepartment |
Valid |
This constraint says that if an RDF instance i is related to j via the property manages and k is related to j via the property works_in, then i must be related to k via the property is_supervisor_of. In ontology A, Jose is related to MyDepartment via manages, Diego is related to MyDepartment via works_in, but Jose is not related to Diego via any property, so the ontology is invalid. In B, Jose is related to Maria and Diego, who both are related to MyDepartment via works_in, via the property is_supervisor_of, so the ontology is valid.
These constraints are more complex, often including multiple conditions, etc.
Each employee either works on at least one project, supervises at least one employee that works on at least one project, or manages at least one department.
| Constraint |
Class: Employee SubClassOf: works_on some Project or supervises some (Employee and works_on some Project) or manages some Department |
|
| A |
Individual: Esteban Types: Employee |
Invalid |
| B |
Individual: Esteban Types: Employee Facts: supervises Lucinda Individual: Lucinda Types: Employee |
Invalid |
| C |
Individual: Esteban Types: Employee Facts: supervises Lucinda Individual: Lucinda Types: Employee Facts: works_on MyProject Individual: MyProject Types: Project |
Valid |
| D |
Individual: Esteban Types: Employee Facts: manages MyDepartment Individual: MyDepartment Types: Department |
Valid |
| E |
Individual: Esteban Facts: manages MyDepartment, works_on MyProject Individual: MyDepartment Types: Department Individual: MyProject Types: Project |
Valid |
This constraint says that if an individual i is an instance of Employee, then at least one of three conditions must be met: First, it is related to an instance of Project via the property works_on. Second, it is related to an instance j via the property supervises and j is an instance of Employee and also related to some instance of Project via the property works_on. Third, it is related to an instance of Department via the property manages.
Ontologies A and B are invalid because none of the conditions are met. C meets the second condition: Esteban (who is an Employee) is related to Lucinda via the property supervises whereas Lucinda is both an Employee and related to MyProject, which is a Project, via the property works_on. D meets the third condition: Esteban is related to an instance of Department, namely MyDepartment, via the property manages. Finally, E meets the first and the third conditions because in addition to managing a department Esteban is also related an instance of Project, namely MyProject, via the property works_on
Only employees who are American citizens can work on a project that receives funds from a US government agency.
| Constraint |
Class: Project and receives_funds_from some US_Government_Agency SubClassOf: inverse(works_on) only (Employee and nationality value "US") |
|
| A |
Individual: MyProject Types: Project Facts: receives_funds_from NASA Individual: NASA Types: US_Government_Agency |
Valid |
| B |
Individual: MyProject Types: Project Facts: receives_funds_from NASA Individual: NASA Types: US_Government_Agency Individual: Andy Types: Employee Facts: works_on MyProject |
Invalid |
| C |
Individual: MyProject Types: Project Facts: receives_funds_from NASA Individual: NASA Types: US_Government_Agency Individual: Andy Types: Employee Facts: works_on MyProject, nationality "US" |
Invalid |
| D |
Individual: MyProject Types: Project Facts: receives_funds_from NASA Individual: NASA Types: US_Government_Agency Individual: Andy Types: Employee Facts: works_on MyProject, nationality "US" Individual: Heidi Types: Supervisor Facts: works_on MyProject, nationality "US" |
Invalid |
| E |
Individual: MyProject Types: Project Facts: receives_funds_from NASA Individual: NASA Types: US_Government_Agency Individual: Andy Types: Employee Facts: works_on MyProject, nationality "US" Individual: Heidi Types: Supervisor Facts: works_on MyProject, nationality "US" Class: Supervisor SubClassOf: Employee |
Valid |
This constraint says that if an individual i is an instance of Project and is related to an instance of US_Government_Agency via the property receives_funds_from, then any individual j which is related to i via the property works_on must satisfy two conditions: First, it must be an instance of Employee. Second, it must not be related to any literal other than "US" via the data property nationality.
Ontology A is valid because there is no individual related to MyProject via works_on, so the constraint is vacuously satisfied. Ontology B is invalid since Andy is related to MyProject via works_on, MyProject is an instance of Project and is related to an instance of US_Government_Agency, that is, NASA, via receives_funds_from, but Andy does not have any data property assertions. C is valid because both conditions are met. D is not valid because Heidi violated the first condition: she is related to MyProject via works_on but is not known to be an instance of Employee. Finally, this is fixed in the ontology E which states that every instance of Supervisor is an instance of Employee, so Heidi is inferred to be an instance of Employee and, consequently, the ontology is valid.