Augmenting Search
Get the latest in your inbox
Get the latest in your inbox
Give your Knowledge Graph search results a makeover.
You’ve mapped and loaded a few sets of data into Stardog. Now what? Depending on your use case, you may be building reports based on SPARQL queries or a search-oriented front-end to a unified view of some unstructured data. Stardog provides a capable full-text index (FTS) to support searching as well as several other features which can significantly add value to search results. This post explores some of these feature combinations to inspire some ideas for your own applications.
Let’s sketch out an example scenario based around a corpus of documents loaded into Stardog’s BITES system. BITES provides document storage and indexing as well as some general NLP services. BITES isn’t intended to replace any other document management systems such as SharePoint, although it’s certainly capable of functioning as the backend of such an application. BITES shines when employed as a document search and processing system used to connect document contents to the rest of your Knowledge Graph.
BITES can index and process documents from your current document storage solution, including SharePoint, Dropbox, Confluence, etc. BITES is completely general and includes pluggable extension points to configure ingest of any type of file. Additionally, BITES allows customizable extraction processing and ships with several NLP modules including entity extraction.
So let’s assume that you’ve loaded some documents into BITES, potentially from several different parts of your organization. You’re now equipped with a searchable view of these documents as well as structured data extracted from the corpus.
The other ingredient is an existing Knowledge Graph, whether materialized into Stardog, or federated as a set of virtual graphs—or some combination of these access patterns. Remember: a key value proposition of a Knowledge Graph is data location doesn’t matter. Data is invariably linked; hence, creating a unified view over disparate sources is the challenge that Stardog addresses.
Here’s what we’re working with in terms of data:
Stardog’s builtin full-text index
provides search capabilities over the graph and the BITES document set. SPARQL
queries can use the <tag:stardog:api:property:textMatch>
predicate to perform
these search queries.
If we extract entities with BITES, we can augment search results with other entities found in documents matching the search. This is where Knowledge Graph unification shines. What if we searched for “George Clooney” and found a review of Ocean’s Eleven mentioning other actors in the film? These can be shown alongside the search results, correlated with each document.
A similar approach can be used to add relevant product results to a recipe search. A dictionary-based linker provides recognition of entities in the graph. Product details such as price and availability can be retrieved from external sources. Another possibility is extracting publisher and publication dates from documents. Combined with a source of publisher locations, we can improve search relevance by prioritizing recent and nearby results. A user in New York searching for “events” likely wouldn’t have much interest in results from a local Mexican newspaper.
We could even pass the search query through the entity extraction service. This would provide us with the entities used in the query allowing us to combine the text search result with a query over entity mentions in the BITES index. A search for “Will Smith” might also match documents containing the words “will” and “Smith” individually. If we discover that “Will Smith” is a named entity, we can filter out results which don’t explicitly mention “Will Smith”.
Using the builtin entity linker, we extract a set of RDF triples from each document. These triples represent “mentions” in the document. A mention is a reference to a known entity in the graph. The entity linking process is completely independent of use case and searches the graph for known entities. A movie review mentioning George Clooney and Bernie Mac might add the follow triple to the BITES document named graph:
review:Oceans11Review.pdf {
entity:0d25b4ed rdfs:label "George Clooney" ;
dc:references name:nm0000123 .
entity:9811ac8c rdfs:label "Bernie Mac" ;
dc:references name:nm0005170 .
}
The IRIs name:nm0000123
,
name:nm0005170
here identify George Clooney and Bernie
Mac, respectively, as nodes in the graph. Using the dc:references
predicate,
we can query the graph for documents referring to named entities. Combining this
with a search query, we can retrieve a list of named entities for each document
in the search result:
select ?doc ?mention ?type ?label where {
# Full-text query
?doc <tag:stardog:api:property:textMatch> "George Clooney"
# Mentions in matched docs
graph ?doc {
?doc dc:references ?mention
}
# Class of mentioned entities
?mention a ?type ; rdfs:label ?label
}
Executing this query would return a result including matching documents, their mentions (IRIs), and classes and labels of the mentions. It might look like so:
+---------------------------|----------------|-----------|----------------+
| doc | mention | type | label |
+---------------------------|----------------|-----------|----------------+
| review:Oceans11Review.pdf | name:nm0000123 | :Director | George Clooney |
| review:Oceans11Review.pdf | name:nm0005170 | :Actor | Bernie Mac |
| review:Oceans11Review.pdf | name:nm0005170 | :Comedian | Bernie Mac |
+---------------------------|----------------|-----------|----------------+
In addition to the matched documents, we can use mentions, including their type and label, to augment individual search results. Search results become significantly more useful when linked with relevant data. This type of linking is trivial when data is unified in a Knowledge Graph. We can adjust the SPARQL query in many ways to make use of the connected nature of the graph.
As demonstrated, we can combine our text queries with arbitrary SPARQL queries over the unified graph. The recipes example can be expressed in SPARQL like so:
select ?recipe ?product ?productName ?productPrice {
# Full-text query
?recipe <tag:stardog:api:property:textMatch> "potato salad"
# Product mentions in matched recipes
graph ?recipe {
?recipe dc:references ?product
}
# Virtual graph with product details and availability
graph <virtual://product> {
?product a :Product ;
:name ?productName ;
:price ?productPrice ;
:availableQty ?productQty
filter(?productQty > 0)
}
}
Entity references to products are stored for each document. This data is combined with an external data source mapped into the graph providing product details and availability.
In the same vein, given a set of documents pertaining to local events, we could combine it with publisher addresses stored in the graph to increase result relevancy. The text search query is over a set of documents for which we extracted the publisher and publication date (using BITES but not the entity extractor). The publisher is then linked to the graph to find it’s location. A [geospatial query](https://www.stardog.com/blog/geospatial-a-primer/) allows us to compute the distance between two points and order results by relevance:
select ?event ?pubDate ?publisher ?dist ?age {
# Full-text query
?event <tag:stardog:api:property:textMatch> "concert"
# Document graph with extracted details
graph ?event {
?event :publishedOn ?pubDate ;
:publishedBy ?publisher
}
# Graph (potentially virtual) with publisher data
graph <publishers> {
?publisher geo:hasGeometry ?publisherLocation
}
# Compute the distance between the publisher and the location of the user
bind(geof:distance(?publisherLocation, :UserLocation, unit:MileUSStatute) as ?dist)
# Compute the amount of time since the article was published
bind(now() - ?pubDate as ?age)
}
order by desc(?dist) ?age
This query finds concerts using the text search and then orders them first by the shortest distance from the user location and then by the age of the publication date (more recent entries first).
This post contains a glimpse of the ways that searching a Knowledge Graph is awesome. It’s possible to do significantly more than otherwise possible with a simple full-text index. Feel free to use these ideas directly or experiment using other Stardog features such as machine learning and path queries to improve search results.
Read more about how Stardog unifies all types of data.
How to Overcome a Major Enterprise Liability and Unleash Massive Potential
Download for free