Case Study

Leading pharmaceutical company Boehringer Ingelheim drives faster research through Stardog.


Boehringer Ingelheim






Cost savings as a result of virtualization

Stardog’s virtualization capabilities eliminated the need for expensive ETL processes, redundant data storage, and data conversion.

Increased analyst efficiency

Boehringer-Ingelheim’s knowledge graph allows analysts to reuse past research and find answers more quickly.

Increased bioinformatician output

Scientists are now able to quickly answer questions that link data from one domain to another without spending time cleaning data or creating additional local databases

Boehringer Ingelheim recognized the need to connect data from disparate parts of the company to increase research and operational efficiency, increase output, and ultimately accelerate drug research. Using Stardog to build an enterprise knowledge graph has allowed bioinformaticians and analysts to quickly and easily access the full body of institutional knowledge, all while providing cost savings.

The Challenge

Boehringer Ingelheim had many teams of researchers working independently to develop new treatments. But data was often siloed within teams, making it difficult to link targets, genes, and disease data across different parts of the company. 

The team tried several different tech stack approaches. Some teams had built data lakes, but inadequate virtualization capabilities necessitated ETL pipelines to move data. Others had worked to predefine all requirements from scratch in an RDBMS, but that approach couldn’t support the necessary levels of complexity or flexibility. 

Ultimately, they realized that they needed a bigger approach that would establish a technical foundation to enable data sharing across the entire company. This approach needed to link data from across teams, support ontologies to understand how terms related to one another, and have the flexibility to allow them to connect internal experimental results with external publicly available data of varying quality and formats.

"Users can now search for a particular disease, study, or gene, and then explore the results 'Wikipedia-style.'"

The Solution

A knowledge graph built using Stardog was the clear solution to Boehringer Ingelheim’s challenges. The first step was building a semantic layer on top of the existing data lake to provide a consolidated one-stop shop for 90% of their R&D data.

The knowledge graph allows them to connect metadata from across workflow systems, integrating information about how samples were generated and stored, which studies are currently underway or completed, and how specific data points were created and stored.

The semantic layer allows bioinformaticians to access and work with the data, no cleaning required, and the data arrives already linked to the proper entities. Users can now search for a particular disease, study, or gene and then explore the results “Wikipedia-style.” Analysts can see directly in the data model how one piece of data relates to the rest of the R&D data, and they can use a lite query builder to pull reports from the knowledge graph with no SPARQL knowledge required.

"With Stardog’s virtualization capabilities, the organization is able to save money on redundant data storage and costly, time-consuming ETL processes."

The Results

The knowledge graph has allowed bioinformaticians to more easily identify useful signals within large sets of noisy data and to answer highly-specific questions. This is possible because the data in the knowledge graph does not require analysts to spend large amounts of time integrating and cleaning the data or creating new local databases to work from. They can simply query directly using the linked data dictionary and move immediately into analysis.

Analysts can also work more efficiently because R&D data is accessible through standardized protocol. They no longer need to reference data catalogs or make other efforts to find out where data is located, or to spend time understanding how various datasets are organized to integrate them. Instead, they can simply reference the one-stop knowledge graph and ask questions using a natural language interface.

With Stardog’s virtualization capabilities, the organization can save money on redundant data storage and costly, time-consuming ETL processes. Virtualization creates a single, centralized access point for data scientists to work from while allowing the data to remain in the relational databases and other environments where it already exists. The data models accompanying this integration also enable the organization to be more efficient by avoiding redundant research. Instead, they can reuse past answers and focus on new opportunities to build on existing knowledge.

The Future

Now that there is a strong data foundation in place, Boehringer Ingelheim continues to look for opportunities to expand the reach and use cases of the enterprise knowledge graph. Over time, data from additional teams across the organization will be added. This will create more opportunities for users to explore different domains and be connected with datasets from a broad set of departments. With these capabilities in place, the knowledge graph will serve as the infrastructure for all data within Boehringer Ingelheim.

Contact us to learn more about Stardog's solutions

Contact us