Connect with Stardog

At Databricks Data + AI Summit

How Data Reusability Accelerates your Enterprise

Oct 26, 2020, 6 minute read
Stardog Newsletter

Get the latest in your inbox

Data reusability holds promise for enterprises facing increasing market pressure to innovate. How can a data strategy stand up to this pressure? The key is building a reusable data foundation. Instead of starting from scratch for each new project, what if you could stand on the shoulder of giants? Read on to learn how a data fabric can create a reusable foundation to power iterative innovation.

Why are enterprises stuck with reactive data strategies?

Multiple schemas, or data models, are required to manage an enterprise. The data model establishes the meaning of the data as well as the relationship of various entities to one another. At enterprise scale it is inevitable that different business units define terms differently; it may even be required by regulation or law. And, given the increasing relevance of third-party data, it’s simply impossible to impose one definition upon all data producers. 

Currently enterprises overcome these issues by copying data for each new use case, creating a new and distinct data model in the process. However, this practice leads to a proliferation of data within an organization, degrading data quality and causing uncertainty over which copy is the source of truth. Then, when faced with a new project that requires making existing applications speak to one another, effort is wasted on patchworks of otherwise unnecessary code. 

All this leads to slow responses to questions. When unanticipated questions or needs arise, work grinds to a halt as the data preparation starts anew. This reactive data strategy leaves teams flat-footed when the market shifts or new questions arise. Enterprises require a more responsive data strategy, one that keeps pace with the needs of the business.

Data reusability is key for enterprise data management

Stardog is designed to simultaneously support different use cases, orgs, lines of business, and apps in sharing and reusing connected data. A data fabric creates a single, reusable data foundation which can power multiple applications, even if data has to be defined differently across use cases. A key enabler of Stardog’s data reusability is a unique feature called schema multi-tenancy, which allows multiple data models to act upon the connected data in Stardog simultaneously. Stardog is able to provide schema multi-tenancy because it manages data at the compute layer and not only at the storage layer.

Only Stardog supports schema multi-tenancy with performant graph-based virtualization, allowing enterprises to manage the full breadth of complex, connected enterprise data. By supporting inference over virtualized data, you never need to migrate or copy data in order to connect and understand what it means. The result is faster time to insight and no rework when new data, new definitions, or new requirements arise.  

What is schema multi-tenancy?

Schema multi-tenancy is supported by Stardog’s Inference Engine, which harmonizes conflicting data definitions without changing or copying the underlying data. The Inference Engine performs inferencing, also called reasoning, by interpreting your source data against your data model.

Schema multi-tenancy is defined as inferencing with multiple schemas and specifying a schema to be used for answering a query. Each schema has a name and a set of named graphs, and when the schema is selected for answering a query the inference rules stored in the associated graphs will be used to connect data and answer the query.

In addition to supporting multiple use cases simultaneously, schema multi-tenancy can also be used for version management. Data models and application requirements unavoidably change over time but typically at different paces for different parts of the organization. The structure of your data and validation constraints will change as new versions of the data model are generated. Schema multi-tenancy allows multiple applications to access the same data through different lenses—without requiring new copies of the data.

Schema multi-tenancy is only possible due to the unique way that we’ve designed and implemented the Stardog platform. To recap, Stardog’s platform offers a unique combination of semantic graph, inference, and virtualization. Virtualization enables enterprises to operate at the pace of business: accessing and interpreting information via the model in near real-time. 

Schema multi-tenancy requires query-time reasoning

In order to fully support virtualization, Stardog pioneered what is known as query-time reasoning. This means that our Inference Engine performs reasoning just-in-time as queries are executed. Query-time reasoning is necessary to support virtualization of data; since that data is always changing, Stardog’s Inference Engine must be able to interpret new facts at query-time. 

This capability also allows Stardog to embrace schema multi-tenancy. By interpreting business rules, data, and new facts at query time across multiple data models, Stardog allows different parts of the enterprise to have different, even incompatible, views or lenses onto the common, underlying data fabric. Similarly, the data quality constraint capability in Stardog can work with different sets of constraints. In this case, constraints are not used to restrict what data is being stored directly in Stardog or remotely in virtual data sources, but instead allow each application or business unit to check that the data adheres to their requirements.

While other platforms support reasoning, they implement reasoning in a way that prevents schema multi-tenancy. This reasoning technique is called forward-chaining reasoning. This means that every inferred fact from the data and data model is calculated once, at the time that the data is loaded into the platform. Forward-chaining materializes all inferences upfront, so the system doesn’t need to spend any time at query-time to perform reasoning to answer a query. This is appealing to use cases where the data doesn’t change frequently (or at all).

However, forward-chaining reasoning makes it practically infeasible to support either virtualization or to support schema multi-tenancy, especially in the context of external data sources, to say nothing of the combination of virtualization and schema multi-tenancy. Multiple schemas would increase both the computational and the storage requirements for materialized inferences and this would end up affecting the query performance as well. It would also increase the additional work that needs to be done for every update as the inferences for all schemas would need to be updated separately. For Virtual Graphs, where update events might not be visible to outside applications, forward-chaining would complicate this process even further.

Scalability requires data reusability

As the enterprise data landscape becomes increasingly hybrid, varied, and changing, capabilities to manage this complexity are necessary. Where other platforms force rigid access patterns and inflexible data models, Stardog offers the flexibility required for modern enterprise data management.

By unifying data in Stardog, you create a flexible, reusable data layer for answering complex queries across data silos. Interested in learning more about Stardog’s approach? Contact us to get started today.

download our free e-guide

Knowledge Graphs 101

How to Overcome a Major Enterprise Liability and Unleash Massive Potential

Download for free
ebook