Easy Graph is Good Graph
Get the latest in your inbox
Get the latest in your inbox
Stardog 5.2.2 makes it drop dead easy to use named entity recognition and linking and to map RDBMS silos into the graph.
At Stardog we recognize the importance of ease of use. It’s always been a priority and remains a major factor when designing interfaces to features. Along with many minor improvements, Stardog 5.2.2 extends usability in two significant ways: first, named entity recognition and linking applied to any strings in the graph; and, second, automatic mapping of RDBMS sources without user intervention.
We previously announced the named entity recognition and linking feature as part of Stardog’s BITES framework for ingesting unstructured data. We’ve now extended this by providing a SPARQL interface to the named entity recognition and linking algorithms. This makes it applicable to any data available in the graph, whether stored directly in Stardog or accessed remotely on SPARQL endpoints or virtual graphs.
The SPARQL interface is based on our continued extension of the SPARQL
SERVICE
facility. Here’s an example query using the named entity
recognizer (NER):
prefix docs: <tag:stardog:api:docs:>
select * {
# imagine a SPARQL query over a set of blog posts
?post :postText ?text
service docs:entityExtractor {
[] docs:text ?text;
docs:mention ?m
}
}
Looks pretty easy, eh? Let’s dive in.
We start with a SPARQL query over the text segments we want to pass through the NER extractor—in this case, the bodies of blog postings. But it could be anything.
The graph patterns in the SERVICE
block represent arguments to the extraction
procedure. The subject is a bnode and its purpose here is to
correlate the arguments together. The predicate-object pair docs:text ?text
indicates the variable we want to run the extractor over. This
argument receives special treatment in that it’s considered an input
to the service. The ?text
variable must be bound outside the
service. The pair docs:mention ?m
indicates that we want to bind
each mention to the variable ?m
. The mention is considered an output
from the service. It can be joined to other graph patterns in most
circumstances.
A result from this query might look something like this:
+----------------------------------------------------------------------------------+------------------+---------------+
| text | m | post |
+----------------------------------------------------------------------------------+------------------+---------------+
| "A drama titled Watergate is being developed by George Clooney and Bridge of | "Matt Charman" | :MovieArticle |
| Spies writer Matt Charman." | | |
| "A drama titled Watergate is being developed by George Clooney and Bridge of | "Watergate" | :MovieArticle |
| Spies writer Matt Charman." | | |
| "A drama titled Watergate is being developed by George Clooney and Bridge of | "George Clooney" | :MovieArticle |
| Spies writer Matt Charman." | | |
+----------------------------------------------------------------------------------+------------------+---------------+
We have a text string with three different entities mentioned in it
and the NER extractor returns each of them. Going further we can use
the same service to link these mentions to entities in the graph. We
add two additional arguments to the service: docs:entity ?entity
and
docs:type ?type
. These arguments indicate that we want to link entities as
well as retrieve their NER type:
prefix docs: <tag:stardog:api:docs:>
select * {
?post :postText ?text
service docs:entityExtractor {
[] docs:text ?text;
docs:mention ?m ;
docs:entity ?entity ;
docs:type ?type
}
}
Assuming we only have an entity for George Clooney in the graph, we get a smaller result when requesting linked entities:
+------------------------+------------------+----------------+---------------------------------+---------------+
| text | m | entity | type | post |
+------------------------+------------------+----------------+---------------------------------+---------------+
| "A drama titled Wa..." | "George Clooney" | :GeorgeClooney | tag:stardog:api:docs:ner:person | :MovieArticle |
+------------------------+------------------+----------------+---------------------------------+---------------+
This is a powerful new addition to Stardog and one that can be applied to any data in the graph, including virtual graphs which can be mapped automatically! Let’s look at that next.
In our ongoing efforts to unify all the data, we’ve removed a major barrier to entry when integrating RDBMS sources: writing mappings.
Stardog 5’s virtual graph engine, VEGA, is a powerful and flexible engine for mapping relational data into the graph, but it needs some direction to do so. This direction comes in the form of mappings following the R2RML standard, alternatively expressed using the more intuitive Stardog Mapping Syntax.
With Stardog 5.2.2, mappings need not be written manually. A virtual graph created with no mappings creates a default set of mappings by introspecting the database schema.
Without going through the tedium of looking at the generated mappings,
let’s see how this works. We first create the sample tables from the
docs. Next, we
use a basic emp.properties
file to point to our SQL
database:
base=http://example.com/emp/
jdbc.driver=com.mysql.jdbc.Driver
jdbc.url=jdbc:mysql://172.31.90.169/emp
jdbc.username=user
jdbc.password=pass
We specified base
here which is used as the IRI prefix for mapped
predicates. Finally, we create a virtual graph without supplying any mappings.
$ stardog-admin virtual add emp.properties
We can now query the emp
virtual graph. Each table is used to
generate a class and each column a predicate. Foreign keys are used to
generate IRIs which represent relationships between entities in
different tables.
Here’s a sample query:
prefix emp: <http://example.com/emp/EMP#>
prefix dept: <http://example.com/emp/DEPT#>
select * from <virtual://emp> {
?emp a <http://example.com/emp/EMP> ;
emp:ename ?name ;
emp:job ?job ;
emp:ref-deptno ?dept .
?dept dept:loc "NEW YORK"
}
We find all employees in departments in New York including their names and jobs:
+---------------------------------------+---------+---------+---------------------------------------+
| emp | name | job | dept |
+---------------------------------------+---------+---------+---------------------------------------+
| http://example.com/emp/EMP/empno=7369 | "SMITH" | "CLERK" | http://example.com/emp/DEPT/deptno=10 |
+---------------------------------------+---------+---------+---------------------------------------+
Bootstrapping a virtual graph—or even a bunch of them—is really this easy now. The generated mappings can be exported and modified to align with other parts of the graph schema or comply with naming conventions. Of course the full power of custom mappings is still available when needed.
Stardog’s mission is to provide you with tools to bring your siloed enterprise data into the graph, whether physically or virtually, and make sense of that data. Virtual graphs can now be created quicker and easier and extended NLP features help to dig into data and extract value.
How to Overcome a Major Enterprise Liability and Unleash Massive Potential
Download for free