Entity Linking in the Knowledge Graph

Pedro Oliveira

Jan 24, 2018, 4 minute read

Stardog 5.2 introduces a new information extraction module for BITES, our unstructured data unification and processing engine.

Stardog now lets you extract named entity mentions from text documents and to link those mentions to existing entities in a knowledge graph.

Entity Recognition and Linking

Named entity recognition is one of the most well known NLP tasks. The main idea is simple: given some text, can we locate which words identify entities of certain categories? For example,

Stardog is the world’s leading Knowledge Graph platform for the enterprise.

An entity recognizer might notice that Stardog and Knowledge Graph are entities in some knowledge base.

There is an extensive body of research in this area, and most NLP libraries implement some kind of technique to identify named entities of the most common categories (e.g., person, organization, etc).

When you have a knowledge graph, with its rich structure and detailed information, simply extracting named entity mentions falls short of what could possibly be done. Why can’t we go a step further and assert that those mentions refer to actual entities in the knowledge graph? This task is commonly called entity linking, and Stardog supports a simple but effective pipeline to perform entity linking from any kind of text.

Stardog is the world’s leading Knowledge Graph platform for the enterprise.

With entity linking in Stardog, all mentions or occurrence of some entity are linked to the knowledge base item (a node or edge, likely) that they represent.

In this blog post, we will show you how to use this new capability.

Finding Celebrities in Movie News Articles

Surprise! You now have access to a knowledge graph about movies.

t:tt1454468 a :Movie ;
    rdfs:label "Gravity" ;
    :description "Two astronauts work together to survive after an accident which leaves them alone in space." ;
    :actor n:nm0000123, n:nm0000113 , n:nm0000438 , n:nm1241511 ;
    :director n:nm0190859 ;
    :author n:nm0190859 , n:nm0190861 ;
    :genre "Sci-Fi" , "Adventure" ;
	:copyrightYear 2013 .

n:nm0000123 a :Person ;
    rdfs:label "George Clooney" .

There are many amazing things you could do with this data. I personally like the idea of being able to find which celebrities are being talked about in all the juicy news articles about upcoming TV shows.

A drama titled Watergate is being developed by George Clooney and Bridge of Spies writer Matt Charman. Clooney’s Smokehouse Pictures will produce the eight-episode limited series, with the film star and his partner Grant Heslov serving as executive producers.

Let’s find out how to do this with Stardog.

Preprocessing

Named entity recognition in Stardog is based on OpenNLP, a well known NLP library. As a configuration, we need to tell Stardog which category of entities we want to extract.

OpenNLP provides several basic models for different languages. In this case we are interested in finding people’s name in English language documents. So we download en-ner-person.bin to a folder. Two extra models are always required: a sentence detector and a tokenizer. In this case, we will also download en-sent.bin and en-token.bin to the same place.

Next we need to tell Stardog where this folder of stuff is located. This is done through a configuration option, docs.opennlp.models.path, which can be set during database creation.

./stardog-admin db create -o search.enabled=true docs.opennlp.models.path=/path/to/folder -n movies person_movie.ttl

And that’s it! No extra configuration is required.

Extracting Entities

As an introduction, let’s simply extract named entity mentions, without actually linking them to the knowledge graph. This can be done by setting the RDF extractor to entities, giving the text content of the news article as an argument.

./stardog doc put movies -r entities article.txt

The document is added to the database and the extracted entities can be queried with SPARQL.

select ?mention where {
  graph <tag:stardog:api:docs:movies:article.txt> {
    ?s rdfs:label ?mention .
  }
}

+------------------+
|     mention      |
+------------------+
| "Matt Charman"   |
| "Grant Heslov"   |
| "George Clooney" |
+------------------+

Entity Linking

By setting the RDF extractor to linker, entities are not only extracted but also, whenever possible, automatically linked to entities in the knowledge graph.

./stardog doc put movies -r linker article.txt

select ?mention ?entity where {
  graph <tag:stardog:api:docs:movies:article.txt> {
    ?s rdfs:label ?mention ;
    ?s <http://purl.org/dc/terms/references> ?entity .
  }
}

+------------------+--------------------------------------+
|     mention      |                entity                |
+------------------+--------------------------------------+
| "George Clooney" | <http://www.imdb.com/name/nm0000123> |
| "Matt Charman"   | <http://www.imdb.com/name/nm4131020> |
| "Grant Heslov"   | <http://www.imdb.com/name/nm0381416> |
+------------------+--------------------------------------+

All three named entity mentions were found to be already present in the knowledge graph. This assumption is made by heuristically matching the mention with the expected string representation of a resource. For this Stardog will look at the similarity of the mention to things such as label properties (e.g., rdfs:label, foaf:name) and an IRI’s local name.

What’s Next

Our off-the-shelf entity linker is just the beginning. In a follow up blog post, we will show how to build domain-specific entity linking pipelines. Future Stardog releases will be adding new features and incrementing our system’s information extraction capabilities.

In the meantime, give entity linking a go by downloading our trial now, and let us know what you think!