How to Build a Semantic Search Engine Using a Knowledge Graph

Chris Hall

Aug 30, 2021, 6 minute read

Having the right information at your fingertips is critical. However, in most organizations, information is spread across different systems and lives in a variety of formats. As a result, people wind up manually hunting down information across various systems — wasting significant time — or making critical errors because their decisions were not properly informed.

Search applications have emerged as the de facto solution to this problem, promising to unlock access to information. For internal search applications, arming employees with the right information saves time and lets experts do their job better by reducing blind spots and preventing duplicate work. For products with built-in search, providing the right information to customers increases user satisfaction and engagement.

“An enterprise of 1,000 knowledge workers wastes $5.7 million each year searching for information, but not finding it.” - International Data Corp.

But as search tools have become prominent, user expectations have also increased. There is increasing pressure to return the single correct answer; it is no longer sufficient to allow users to scan through pages and pages of results. From chatbots to voice assistants like Alexa, the rise of AI is changing how everyone wants to interact with information. Now, the fewer the results, the better.

Additionally, search is a more complex solution that ever before. Source material is dispersed between internal material and external vendors. Voice search is leading to more conversational requests, which can confuse even an advanced search engine.

While there are plenty of enterprise search options on the market, they’re failing in the face of this increasing complexity and performance demands. Traditional enterprise search solutions face a tradeoff between precision and recall. Increasing precision may leave out potentially relevant materials. Prioritizing recall often returns an overwhelming number of results, many of them irrelevant. A new solution is required to keep pace with enterprise search demands.

Signs you need an upgrade

You don’t get the answers you need

Irrelevant results happen when the search engine can’t accurately capture user intent. Humans think associatively but normal search applications index information hierarchically or rank results based on term frequency. While enterprise search offerings are proficient at fuzzy matches for misspellings or recognizing synonyms, they are not designed to interpret the complex mental associations humans naturally create between various concepts. Even a data dictionary cannot process 2nd- and 3rd-level associations, resulting in relevant results being left out. Bottom line: if the search engine can’t understand the question you’re asking, it can’t return the best results.

Answers you know should be there are left out

Subject matter experts are often the ones who realize that results they know are correct are left out! A common culprit behind this is that the correct source material hasn’t been indexed by the search engine. More and more, that source material doesn’t just live in databases, but also lives in images, videos, PDFs, and other unstructured data. Modern search solutions must be able to return results across various file types.

It takes too long to find the right answer

You might find the answer you want, but it’s not the top result. Users waste time in “secondary searches” scanning pages of often irrelevant results. Especially with new formats of search applications like voice assistants and chatbots — essentially any AI app — there is increased pressure to return just one correct result.

Building a smarter search engine

In contrast to typical enterprise search solutions, knowledge graph-powered search returns fewer, more relevant results, reducing time spent searching up to 90%. A knowledge graph improves search by capturing the meaning of the search terms. For this reason, knowledge graph-powered search is often called “semantic search” — search enriched with meaning. Essentially, semantic search operates by representing the layers of connections between various data sources. By representing these myriad relationships, the search engine is able to operate similarly to how humans think, organizing information though associations and hierarchies.

If you’re unfamiliar with knowledge graphs, you might be surprised to learn you probably use them every day. Google Search is powered by a knowledge graph that contains 500 billion facts about five billion entities. Amazon’s Alexa uses a knowledge graph to help return a single perfect answer. Uber Eats’ knowledge graph helps people find the exact food they want as effortlessly as possible. The list goes on: Pinterest, LinkedIn, eBay, Airbnb, and more all use knowledge graphs.

By capturing relevant meaning, knowledge graphs serve as the basis for better product exploration and personalized recommendations. Stardog’s Enterprise Knowledge Graph platform powers semantic search experiences at the world’s top banks, manufacturers, pharmaceutical companies, publishers, and more.

Let’s get into how you can use a knowledge graph to build a better search experience.

Step 1: Ask a better question

Before searching for answers, Stardog makes sure it’s asking the right question. Humans often don’t ask the right question, so this is a critical step. Stardog rewrites the query based on the context of related terms to ask a better question of the data sources unified in the knowledge graph.

Stardog is able to do this because of how it stores information. At its core, Stardog is based on a semantic graph data structure, which links related information together in a network.

This expressive format is what allows Stardog to interpret real-world meaning, and effectively rewrite queries for better interpretation. This ensures that the search application accurately reflects user intent by interpreting the context of the search terms.

Step 2: Search across sources

Once the query is rewritten to accurately reflect the user’s context, it is evaluated in two phases. First, the basic full-text search is executed. Fundamentally, this is just like Solr or Lucene; fuzzy matches, case inconsistencies, and stop words are accounted for and the relevant results from the full-text index are ranked and returned.

Then, Stardog lets you evaluate those results against all other information in the knowledge graph — structured, semi-structured, and other unstructured data — to ensure that the search occurs across the full breadth of relevant information. Stardog works with numerous specialized NLP partners to accelerate extracting and tagging data from unstructured sources like research, regulations, and other texts.

Virtualization is key for scalable search solutions. Virtualization accesses source data directly, cutting down on what would be an otherwise complex and cumbersome ETL system migrating data from dozens or even hundreds of systems and external vendors into a single repository. Working with copies of data instead of source data both introduces risk (human error) and degrades data quality (data is not current or comprehensive). Instead, virtualization ensures the most up-to-date information is always readily available to users. Stardog’s third-generation Virtual Graphs offers dozens of certified Connectors to popular enterprise data sources.

Step 3: Refine results

At this point, Stardog has ensured that all the possibly relevant results are included in the search and ranked these for relevance. Now, in order to ensure that users don’t waste time scanning through results, Stardog further refines results using business logic.

Stardog’s best-in-class Inference Engine is responsible for intelligently applying business logic to the underlying data at query time in order to provide situationally optimized results. This business logic is stored centrally in a data model. Compared to coding business logic on top of a database storing this data, the low-code knowledge graph stores business logic centrally as modular rules.

That’s all you need to get started with a Semantic Search application powered by knowledge graph. Ready to get started?