Data Lake

Bring actionable meaning to your data

Build a semantic layer on your data lake

What is a data lake?

Data lakes serve as vast repositories of enterprise data, typically a combination of raw structured, unstructured and semi-structured as well as transformed data for specific purposes such as analytics, visualization and reporting. Theoretically, data lakes should make it easier to ingest, combine, analyze and use diverse data as well as to unleash machine learning and other AI for discovery of predictive patterns and insights. In reality, these benefits are often elusive.

Where do data lakes fall short?

While data lakes centralize data physically, they have not been successful at driving insight within the business. While colocation solves the problem of data access, it doesn’t improve data salience or usability. In fact, data lakes can sometimes hinder usability. As they take in more and more data, it may be difficult for users to know what’s in them and how the data interrelates. Data lakes can become swamps of unused information. If a big shared file system were the answer to the problem of data silos, then our individual experience of being unable to find data on the file systems of our personal computers would be much less frustrating than it is.

While data catalogs have emerged as a solution to see what’s in the data lake, it’s Stardog that can be used to map and model the complex relationships contained within that data lake. Different business units may have different definitions for the same data. Critically, compared to systems that require a canonical definition of data in order to model, Stardog can actually support multiple versions of the truth. This is because Stardog links related data via a unique ID, leaving source data unchanged.

How are customers using Stardog to modernize their data lake?

Frequently, a semantic layer refers to a virtualization platform sitting atop a data lake. Stardog is similar to a semantic layer but offers a number of additional functionalities, including inference and semantic graph. Critically, Stardog’s flexibility in modeling extends to external sources. And because Stardog does not force you to conform to source systems, it’s easy to relate internal and external data together.

Additionally, Stardog’s virtualization capability extends to other data management systems — in addition to the data lake, data from the data catalog and MDM systems can be unified. This allows for an enriched model of your enterprise data, informed with the full metadata from all relevant systems.

Customers turn to Stardog to create a data layer that connects data regardless of its source or type—and then make that data available to everyone in their organization. An Enterprise Knowledge Graph can also help you recommend the most appropriate data for a particular scenario, to cut down on data corralling time.

Data Fabric: The Next Generation of Data Management

Build a data fabric to power collaborative, cross-functional projects and products. Escape reactive workflows with a resilient digital foundation.

Free download
ebook