The proof of a system's value is its existence. Alan Perlis, Epigrams in Programming

Stardog OWL 2

Stardog may perform OWL reasoning during SPARQL query answering. It supports the OWL 2 profiles and OWL 2 DL via an internal integration of Pellet 3. Stardog supports OWL 2 QL, EL and RL profile reasoning for data; it supports schema-only reasoning for OWL 2 DL.

This chapter explains the OWL 2 reasoning support in Stardog in detail, including relevant background information and reasoning terminology.

In this section we give a brief overview of Stardog's approach to query answering. Based on the specific characteristics of this approach, we derive a set of guidelines that contribute towards efficient query answering. If you are not familiar with the terminology, you can read our section on background and terminology.

ReasoningConnection API Currently, this API only has two methods: - isConsistent(), which can be used to check if the current KB is consistent with respect to the current reasoning level. - isSatisfiable(URI theURIClass), which can be used to check if the given class if satisfiable with respect to the current KB and reasoning level. TBox extraction We can specify from where we want the TBox to be extracted by setting the reasoning.schema.graphs property to a set of named graphs URIs. If we want to take the default graph into account we can use the built-in URI tag:stardog:api:context:default, and if we want to use all named graphs, we can use tag:stardog:api:context:all. The default value for this property is to use the default graph only.

Query Answering and Reasoning

Query answering with reasoning in Stardog is based on query rewriting. Intuitively, the idea is to expand or rewrite the original query with respect to the knowledge represented in the Schema, and then executing the resulting expanded query (EQ) over the Data only.

Blackout—Stardog's internal reasoner—implements a highly optimized version of the query rewriting algorithm of PĂ©rez-Urbina et al. As can be seen in Figure 2, the rewriting process involves 5 different phases. For the sake of simplicity we do not treat them separately here; interested readers can refer to the provided paper for a detailed presentation.

Figure 1. Query AnsweringFigure 2. Query Rewriting

We illustrate the query answering process by means of an example. Consider an OWL 2 EL DB MyDB1 containing the following Schema axioms:


  :SeniorManager rdfs:subClassOf :manages some :Manager
  :manages some :Employee rdfs:subClassOf :Manager
  :Manager rdfs:subClassOf :Employee
  

stating that a senior manager manages at least one manager, that every person that manages an employee is a manager, and that every manager is also an employee. Moreover, let us assume MyDB1 also contains the following Data assertions:


  :Bill rdf:type :SeniorManager
  :Robert rdf:type :Manager
  :Ana :manages :Lucy
  :Lucy rdf:type :Employee
  

Finally, let us assume that we want to retrieve the set of all employees. We do this by posing the following query over MyDB1:


  SELECT ?employee WHERE { ?employee rdf:type :Employee }
  

Given the knowledge captured in the Schema, we expect all individuals occurring in the Data to be part of the answer.

In order to answer this query, Stardog first rewrites it using the relevant knowledge in the Schema. In this case, it can be shown that the EQ produced by Stardog contains the following queries:


  SELECT ?employee WHERE { ?employee rdf:type :Employee }
  SELECT ?employee WHERE { ?employee rdf:type :Manager }
  SELECT ?employee WHERE { ?employee rdf:type :SeniorManager }
  SELECT ?employee WHERE { ?employee :manages ?x. ?x rdf:type :Employee }
  

The second and final step consists of executing the EQ over the Data by computing the union of the results of the produced queries.

The form of the EQ depends on the OWL 2 profile in which the DB is expressed. If the DB is within OWL 2 QL, then every EQ produced by Stardog is guaranteed to be expanded into a set of queries. If the DB is within OWL 2 RL or EL, then the EQ might include a recursive rule; it is important to understand, however, that this is not always the case as demonstrated by the above example.

Why Query Rewriting?

As explained previously, the query rewriting approach deals with query answering in two separate phases: first, the query is rewritten with respect to the Schema in order to get an EQ, and then the EQ is evaluated over the Data. Notably, given this separation, it can be shown that the EQ is independent from the Data. This independence is important on several levels:

  1. The time it takes to compute the EQ depends on the size of the Schema and the original query only, and not on the size of the Data. Typically, the size of the Schema and the query are much smaller than that of the Data.
  2. Given the form of the EQ, it can be evaluated over secondary storage. That is, Data does not need to be loaded into memory, which makes the query rewriting approach to query answering a suitable reasoning technique for scenarios where the Data is too big to fit in memory.
  3. The same EQ can be evaluated over different instances of Data without having to be recomputed.

An alternative is a technique known as materialization. In this approach, it is the Data that gets expanded with respect to the Schema, not the query. That is, the axioms in the Schema are used as rules to generate new triples. For example we could use the axiom :manages some :Employee rdfs:subClassOf :Manager and the Data assertions


  :Ana :manages :Lucy
  :Lucy rdf:type :Employee
  

to derive the new triple:


  :Ana rdf:type :Manager
  

which was only implicit before.

The materialization phase typically consists in applying all the axioms in the Schema to the original Data and the inferred triples until no more triples can be generated. After materialization, we can evaluate queries over the new Data disregarding the Schema altogether. In our example, it can be shown that materializing the Data with respect to the Schema would eventually produce the following triples:


  :Bill rdf:type :Employee
  :Robert rdf:type :Employee
  :Ana rdf:type :Employee
  :Lucy rdf:type :Employee
  

Given the fact that query answering over a materialized database does not require to take the Schema into account, it is typically faster than query answering via query rewriting. However, materialization introduces several issues:

Usage and Guidelines

Loading and Importing Ontologies

Blackout requires the Schema to be present in the Stardog database. Since these are all serialized as RDF, they are loaded into a Stardog database in the same way that any RDF is loaded into a database. The point here to note is, however, that they must be present to be taken into account in the reasoning process.

A related point is that Stardog will not follow owl:imports statements automatically; any imported OWL ontologies that are required for reasoning must be loaded into a Stardog database in the normal way, per the preceding paragraph.

Guidelines for Efficient Query Answering

The query rewriting approach presented previously suggests some guidelines that might contribute to smaller EQs and, therefore, more efficient query answering.

Hierarchies and Queries

Avoid unnecessarily deep class/property hierarchies. If you do not need to model several different types of a given class or property in your Schema, then do not. The reason shallow hierarchies are desirable is that the maximal hierarchy depth in the ontology partly determines the maximal size of the EQs produced by Stardog. The larger the EQ, the more difficult is to evaluate it over the Data.

For example, suppose we add to MyDB1 a very thorough and detailed set of subclasses of the class :Employee:


  :Manager rdfs:subClassOf :Employee
  :SeniorManager rdfs:subClassOf :Manager
  ...
  
  :Supervisor rdfs:subClassOf :Employee
  :DepartmentSupervisor rdfs:subClassOf :Supervisor
  ...
  
  :Secretary rdfs:subClassOf :Employee
  ...
  

If we wanted to retrieve the set of all employees as before, Blackout would produce an EQ containing a query of the following form for every subclass :Ci of :Employee:


  SELECT ?employee WHERE { ?employee rdf:type :Ci }
  

At this point, it is easy to see that the more specific the query, the better as general queries—that is, queries that contain concepts high up in the class hierarchy defined by the Schema—as the one above, will typically yield larger EQs.

Domains and Ranges

Specify domain and range of the properties in the Schema. These types of axiom can help reduce the size of the EQs significantly due to an optimization technique implemented in Blackout called query subsumption. In order to grasp the intuition behind it, let us consider the following query asking for people and the employees they manage:


  SELECT ?manager ?employee WHERE { ?manager :manages ?employee. ?employee rdf:type :Employee }
  

We know that this query would cause a large EQ given the deep hierarchy of :Employee in MyDB1. However, if we added the following single range axiom:


  :manages rdfs:range :Employee
  

then the EQ would collapse to:


  SELECT ?manager ?employee WHERE { ?manager :manages ?employee }
  

which is considerably less difficult to evaluate.

Background and Terminology

Databases

A database (DB), a.k.a. ontology, is composed of two different parts: the Schema or Terminological Box (TBox) and the Data or Assertional Box (ABox). Analogus to relational databases, the TBox can be thought of as the schema, and the ABox as the data. In other words, the TBox is a set of axioms, whereas the ABox is a set of assertions.

As we explain in Section OWL 2 Profiles, the kinds of assertion and axiom that one might use for a particular database are determined by the fragment of OWL 2 to which one would like to adhere. In general, you should choose the OWL 2 profile that most closely fits the data modeling needs of your application.

The most common data assertions are class and property assertions. Class assertions are used to state that a particular individual is an instance of a given class. Property assertions are used to state that two particular individuals (or an individual and a literal) are related via a given property. For example, suppose we have a DB MyDB2 that contains the following data assertions:We use the usual standard prefixes for RDF(S) and OWL.


  :clark_and_parsia rdf:type :Company
  :clark_and_parsia :maintains :Stardog
  

stating that :clark_and_parsia is a company, and that :clark_and_parsia maintains :Stardog.

The most common schema axioms are subclass axioms. Subclass axioms are used to state that every instance of a particular class is also an instance of another class. For example, suppose that MyDB2 contains the following TBox axiom:


  :Company rdfs:subClassOf :Organization
  

stating that companies are a type of organization.

Queries

When reasoning is enabled, Stardog executes SPARQL queries (simply queries from now on) depending on the type of Basic Graph Patterns they contain.

A BGP is said to be an ABox BGP if it is of one of the following forms:

A BGP is said to be a TBox BGP if it is of one of the following forms:

A BGP is said to be a Hybrid BGP if it is of one of the following forms:

where term (possibly with subscripts) is either an URI or variable; uri is a URI; and ?var is a variable.

When executing a query, ABox BGPs are handled by Blackout, TBox BGPs are executed by Pellet, and Hybrid BGPs by a combination of both.

Reasoning

Intuitively, reasoning with a DB means to make implicit knowledge explicit. There are two main use cases for reasoning: infer implicit knowledge and discover modeling errors.

With respect to the first use case, recall that MyDB2 contains the following assertion and axiom:


  :clark_and_parsia rdf:type :Company
  :Company rdfs:subClassOf :Organization
  

From this DB, we can use Stardog in order to infer that :clark_and_parsia is an organization:


  :clark_and_parsia rdf:type :Organization
  

Using reasoning in order to infer implicit knowledge in the context of an enterprise application can lead to simpler queries. Let us suppose, for example, that MyDB2 contains a complex class hierarchy including several types of organization (including company). Let us further suppose that our application requires to use Stardog in order to get the list of all considered organizations. If Stardog were used with reasoning, then we would need only issue the following simple query:


  SELECT ?org WHERE { ?org rdf:type :Organization}
  

In contrast, if we were using Stardog with no reasoning, then we would have to issue the following considerably more complex query that considers all possible types of organization:


  SELECT ?org WHERE { { ?org rdf:type :Organization } UNION 
                      { ?org rdf:type :Company } UNION
                      ... 
                    }
  

Stardog can also be used in order to discover modeling errors in a DB. The most common modeling errors are unsatisfiable classes and inconsistent DBs.

An unsatisfiable class is simply a class that cannot have any instances. Say, for example, that we added the following axioms to MyDB2:


  :Company owl:disjointWith :Organization
  :LLC owl:equivalentClass :Company and :Organization  
  

stating that companies cannot be organizations and vice versa, and that an LLC is a company and an organization. The disjointness axiom causes the class :LLC to be unsatisfiable because, for the DB to be contradiction-free, there can be no instances of :LLC.

Asserting (or inferring) that an unsatisfiable class has an instance, causes the DB to be inconsistent. In the particular case of MyDB2, we know that :clark_and_parsia is a company AND an organization (see above); therefore, we also know that it is an instance of :LLC, and as :LLC is known to be unsatisfiable, we have that MyDB2 is inconsistent.

Using reasoning in order to discover modeling errors in the context of an enterprise application is useful in order to maintain a correct contradiction-free model of the domain. In our example, we discovered that :LLC is unsatisfiable and MyDB2 is inconsistent, which leads us to believe that there is a modeling error in our DB. In this case, it is easy to see that the problem is the disjointness axiom between :Company and :Organization.

OWL 2 Profiles

As explained in the OWL 2 Web Ontology Language Profiles Specification of the W3C, an OWL 2 profile is a trimmed down version of OWL 2 that trades some expressive power for the efficiency of reasoning. There are three OWL 2 profiles, each of which achieves efficiency differently.

Each profile restricts the kinds of axiom and assertion that can be used in a DB. Intuitively, QL is the least expressive of the profiles, followed by RL and EL; however, strictly speaking, no profile is more expressive than any other as they provide incomparable sets of constructs.

Stardog supports the three profiles of OWL 2 by making use of Blackout and Pellet. Notably, since TBox BGPs are handled completely by Pellet, Stardog supports reasoning for the whole of OWL 2 for queries containing TBox BGPs only.

Known Issues

Version 0.9.5 of Stardog does not support the following features:

Notes

Comments

blog comments powered by Disqus