Augmenting Search

Jess Balint

Jul 17, 2018, 6 minute read

Give your Knowledge Graph search results a makeover.

You’ve mapped and loaded a few sets of data into Stardog. Now what? Depending on your use case, you may be building reports based on SPARQL queries or a search-oriented front-end to a unified view of some unstructured data. Stardog provides a capable full-text index (FTS) to support searching as well as several other features which can significantly add value to search results. This post explores some of these feature combinations to inspire some ideas for your own applications.

Document Indexing with BITES

Let’s sketch out an example scenario based around a corpus of documents loaded into Stardog’s BITES system. BITES provides document storage and indexing as well as some general NLP services. BITES isn’t intended to replace any other document management systems such as SharePoint, although it’s certainly capable of functioning as the backend of such an application. BITES shines when employed as a document search and processing system used to connect document contents to the rest of your Knowledge Graph.

BITES can index and process documents from your current document storage solution, including SharePoint, Dropbox, Confluence, etc. BITES is completely general and includes pluggable extension points to configure ingest of any type of file. Additionally, BITES allows customizable extraction processing and ships with several NLP modules including entity extraction.

So let’s assume that you’ve loaded some documents into BITES, potentially from several different parts of your organization. You’re now equipped with a searchable view of these documents as well as structured data extracted from the corpus.

The other ingredient is an existing Knowledge Graph, whether materialized into Stardog, or federated as a set of virtual graphs—or some combination of these access patterns. Remember: a key value proposition of a Knowledge Graph is data location doesn’t matter. Data is invariably linked; hence, creating a unified view over disparate sources is the challenge that Stardog addresses.

Here’s what we’re working with in terms of data:

Searching the Document Store

Stardog’s builtin full-text index provides search capabilities over the graph and the BITES document set. SPARQL queries can use the <tag:stardog:api:property:textMatch> predicate to perform these search queries.

If we extract entities with BITES, we can augment search results with other entities found in documents matching the search. This is where Knowledge Graph unification shines. What if we searched for “George Clooney” and found a review of Ocean’s Eleven mentioning other actors in the film? These can be shown alongside the search results, correlated with each document.

A similar approach can be used to add relevant product results to a recipe search. A dictionary-based linker provides recognition of entities in the graph. Product details such as price and availability can be retrieved from external sources. Another possibility is extracting publisher and publication dates from documents. Combined with a source of publisher locations, we can improve search relevance by prioritizing recent and nearby results. A user in New York searching for “events” likely wouldn’t have much interest in results from a local Mexican newspaper.

We could even pass the search query through the entity extraction service. This would provide us with the entities used in the query allowing us to combine the text search result with a query over entity mentions in the BITES index. A search for “Will Smith” might also match documents containing the words “will” and “Smith” individually. If we discover that “Will Smith” is a named entity, we can filter out results which don’t explicitly mention “Will Smith”.

Extending Search Results with Entity Extraction

Using the builtin entity linker, we extract a set of RDF triples from each document. These triples represent “mentions” in the document. A mention is a reference to a known entity in the graph. The entity linking process is completely independent of use case and searches the graph for known entities. A movie review mentioning George Clooney and Bernie Mac might add the follow triple to the BITES document named graph:

review:Oceans11Review.pdf {
	entity:0d25b4ed rdfs:label "George Clooney" ;
		dc:references name:nm0000123 .

	entity:9811ac8c rdfs:label "Bernie Mac" ;
		dc:references name:nm0005170 .
}

The IRIs name:nm0000123, name:nm0005170 here identify George Clooney and Bernie Mac, respectively, as nodes in the graph. Using the dc:references predicate, we can query the graph for documents referring to named entities. Combining this with a search query, we can retrieve a list of named entities for each document in the search result:

select ?doc ?mention ?type ?label where {
  # Full-text query
  ?doc <tag:stardog:api:property:textMatch> "George Clooney"

  # Mentions in matched docs
  graph ?doc {
    ?doc dc:references ?mention
  }

  # Class of mentioned entities
  ?mention a ?type ; rdfs:label ?label
}

Executing this query would return a result including matching documents, their mentions (IRIs), and classes and labels of the mentions. It might look like so:

+---------------------------|----------------|-----------|----------------+
| doc                       | mention        | type      | label          |
+---------------------------|----------------|-----------|----------------+
| review:Oceans11Review.pdf | name:nm0000123 | :Director | George Clooney |
| review:Oceans11Review.pdf | name:nm0005170 | :Actor    | Bernie Mac     |
| review:Oceans11Review.pdf | name:nm0005170 | :Comedian | Bernie Mac     |
+---------------------------|----------------|-----------|----------------+

In addition to the matched documents, we can use mentions, including their type and label, to augment individual search results. Search results become significantly more useful when linked with relevant data. This type of linking is trivial when data is unified in a Knowledge Graph. We can adjust the SPARQL query in many ways to make use of the connected nature of the graph.

Extending Search Results with External Data Sources

As demonstrated, we can combine our text queries with arbitrary SPARQL queries over the unified graph. The recipes example can be expressed in SPARQL like so:

select ?recipe ?product ?productName ?productPrice {
  # Full-text query
  ?recipe <tag:stardog:api:property:textMatch> "potato salad"

  # Product mentions in matched recipes
  graph ?recipe {
    ?recipe dc:references ?product
  }

  # Virtual graph with product details and availability
  graph <virtual://product> {
    ?product a :Product ;
      :name ?productName ;
      :price ?productPrice ;
      :availableQty ?productQty
    filter(?productQty > 0)
  }
}

Entity references to products are stored for each document. This data is combined with an external data source mapped into the graph providing product details and availability.

In the same vein, given a set of documents pertaining to local events, we could combine it with publisher addresses stored in the graph to increase result relevancy. The text search query is over a set of documents for which we extracted the publisher and publication date (using BITES but not the entity extractor). The publisher is then linked to the graph to find it’s location. A [geospatial query](https://www.stardog.com/blog/geospatial-a-primer/) allows us to compute the distance between two points and order results by relevance:

select ?event ?pubDate ?publisher ?dist ?age {
  # Full-text query
  ?event <tag:stardog:api:property:textMatch> "concert"

  # Document graph with extracted details
  graph ?event {
    ?event :publishedOn ?pubDate ;
      :publishedBy ?publisher
  }

  # Graph (potentially virtual) with publisher data
  graph <publishers> {
    ?publisher geo:hasGeometry ?publisherLocation
  }

  # Compute the distance between the publisher and the location of the user
  bind(geof:distance(?publisherLocation, :UserLocation, unit:MileUSStatute) as ?dist)
  # Compute the amount of time since the article was published
  bind(now() - ?pubDate as ?age)
}
order by desc(?dist) ?age

This query finds concerts using the text search and then orders them first by the shortest distance from the user location and then by the age of the publication date (more recent entries first).

Use Your Data in Searches

This post contains a glimpse of the ways that searching a Knowledge Graph is awesome. It’s possible to do significantly more than otherwise possible with a simple full-text index. Feel free to use these ideas directly or experiment using other Stardog features such as machine learning and path queries to improve search results.

Read more about how Stardog unifies all types of data.