Easy Graph is Good Graph

By , · 5 minute read

Stardog 5.2.2 makes it drop dead easy to use named entity recognition and linking and to map RDBMS silos into the graph.

At Stardog we recognize the importance of ease of use. It’s always been a priority and remains a major factor when designing interfaces to features. Along with many minor improvements, Stardog 5.2.2 extends usability in two significant ways: first, named entity recognition and linking applied to any strings in the graph; and, second, automatic mapping of RDBMS sources without user intervention.

NER and Entity Linking in SPARQL

We previously announced the named entity recognition and linking feature as part of Stardog’s BITES framework for ingesting unstructured data. We’ve now extended this by providing a SPARQL interface to the named entity recognition and linking algorithms. This makes it applicable to any data available in the graph, whether stored directly in Stardog or accessed remotely on SPARQL endpoints or virtual graphs.

The SPARQL interface is based on our continued extension of the SPARQL SERVICE facility. Here’s an example query using the named entity recognizer (NER):

prefix docs: <tag:stardog:api:docs:>

select * {
  # imagine a SPARQL query over a set of blog posts
  ?post :postText ?text

  service docs:entityExtractor {
    [] docs:text ?text;
       docs:mention ?m
  }
}

Looks pretty easy, eh? Let’s dive in.

We start with a SPARQL query over the text segments we want to pass through the NER extractor—in this case, the bodies of blog postings. But it could be anything.

The graph patterns in the SERVICE block represent arguments to the extraction procedure. The subject is a bnode and its purpose here is to correlate the arguments together. The predicate-object pair docs:text ?text indicates the variable we want to run the extractor over. This argument receives special treatment in that it’s considered an input to the service. The ?text variable must be bound outside the service. The pair docs:mention ?m indicates that we want to bind each mention to the variable ?m. The mention is considered an output from the service. It can be joined to other graph patterns in most circumstances.

A result from this query might look something like this:

+----------------------------------------------------------------------------------+------------------+---------------+
|                                       text                                       |        m         |     post      |
+----------------------------------------------------------------------------------+------------------+---------------+
| "A drama titled Watergate is being developed by George Clooney and Bridge of     | "Matt Charman"   | :MovieArticle |
| Spies writer Matt Charman."                                                      |                  |               |
| "A drama titled Watergate is being developed by George Clooney and Bridge of     | "Watergate"      | :MovieArticle |
| Spies writer Matt Charman."                                                      |                  |               |
| "A drama titled Watergate is being developed by George Clooney and Bridge of     | "George Clooney" | :MovieArticle |
| Spies writer Matt Charman."                                                      |                  |               |
+----------------------------------------------------------------------------------+------------------+---------------+

We have a text string with three different entities mentioned in it and the NER extractor returns each of them. Going further we can use the same service to link these mentions to entities in the graph. We add two additional arguments to the service: docs:entity ?entity and docs:type ?type. These arguments indicate that we want to link entities as well as retrieve their NER type:

prefix docs: <tag:stardog:api:docs:>

select * {
  ?post :postText ?text

  service docs:entityExtractor {
    [] docs:text ?text;
       docs:mention ?m ;
       docs:entity ?entity ;
       docs:type ?type
  }
}

Assuming we only have an entity for George Clooney in the graph, we get a smaller result when requesting linked entities:

+------------------------+------------------+----------------+---------------------------------+---------------+
|         text           |        m         |     entity     |              type               |     post      |
+------------------------+------------------+----------------+---------------------------------+---------------+
| "A drama titled Wa..." | "George Clooney" | :GeorgeClooney | tag:stardog:api:docs:ner:person | :MovieArticle |
+------------------------+------------------+----------------+---------------------------------+---------------+

This is a powerful new addition to Stardog and one that can be applied to any data in the graph, including virtual graphs which can be mapped automatically! Let’s look at that next.

Automatic RDBMS Mappings

In our ongoing efforts to unify all the data, we’ve removed a major barrier to entry when integrating RDBMS sources: writing mappings.

Stardog 5’s virtual graph engine, VEGA, is a powerful and flexible engine for mapping relational data into the graph, but it needs some direction to do so. This direction comes in the form of mappings following the R2RML standard, alternatively expressed using the more intuitive Stardog Mapping Syntax.

With Stardog 5.2.2, mappings need not be written manually. A virtual graph created with no mappings creates a default set of mappings by introspecting the database schema.

Without going through the tedium of looking at the generated mappings, let’s see how this works. We first create the sample tables from the docs. Next, we use a basic emp.properties file to point to our SQL database:

base=http://example.com/emp/
jdbc.driver=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://172.31.90.169/emp
jdbc.username=user
jdbc.password=pass

We specified base here which is used as the IRI prefix for mapped predicates. Finally, we create a virtual graph without supplying any mappings.

$ stardog-admin virtual add emp.properties

We can now query the emp virtual graph. Each table is used to generate a class and each column a predicate. Foreign keys are used to generate IRIs which represent relationships between entities in different tables.

Here’s a sample query:

prefix emp: <http://example.com/emp/EMP#>
prefix dept: <http://example.com/emp/DEPT#>

select * from <virtual://emp> {
?emp a <http://example.com/emp/EMP> ;
  emp:ename ?name ;
  emp:job ?job ;
  emp:ref-deptno ?dept .
?dept dept:loc "NEW YORK"
}

We find all employees in departments in New York including their names and jobs:

+---------------------------------------+---------+---------+---------------------------------------+
|                  emp                  |  name   |   job   |                 dept                  |
+---------------------------------------+---------+---------+---------------------------------------+
| http://example.com/emp/EMP/empno=7369 | "SMITH" | "CLERK" | http://example.com/emp/DEPT/deptno=10 |
+---------------------------------------+---------+---------+---------------------------------------+

Bootstrapping a virtual graph—or even a bunch of them—is really this easy now. The generated mappings can be exported and modified to align with other parts of the graph schema or comply with naming conventions. Of course the full power of custom mappings is still available when needed.

TL;DR

Stardog’s mission is to provide you with tools to bring your siloed enterprise data into the graph, whether physically or virtually, and make sense of that data. Virtual graphs can now be created quicker and easier and extended NLP features help to dig into data and extract value.


Top