The secret to data analytics success today is a democratized, self-service data platform, and that’s only possible when you leverage a semantic layer to shift from columns to concepts.
The Russian novelist Leo Tolstoy opened Anna Karenina by remarking on the asymmetry between the uniformity of success and the diversity of failure. What he actually wrote was, of course, far more elegant: “Happy families are all alike; every unhappy family is unhappy in its own way.”
I’ll rephrase Tolstoy. Successful data-driven enterprises are all alike; every unsuccessful data-driven enterprise is unsuccessful in its own way. Every enterprise wants to be data-driven. Yet very few actually are. A recent McKinsey study explicitly tied data and analytics success to self-service and data democratization. Wanna win the data analytics game? Aim for democratized, self-service systems.
So we can rephrase Tolstoy again. Successful data-driven enterprises are all alike in providing democratized, self-service access to enterprise data; every unsuccessful enterprise is unsuccessful precisely because it’s doing something (anything, really) else! As the McKinsey study put it, leading organizations are “twice as likely to make data accessible across the organization” than laggard organizations. Which just means that all of us (empowered by self-service, democratized data) are smarter than some of us (i.e, data is only accessible to IT or data engineers, etc).
Big orgs need to derive insight from their big data, which is itself hybrid, diverse, and ever-changing, and they need to do this rapidly. The primary obstacle is that data isn’t accessible to everyone. It’s not accessible to everyone because of architectural limitations with conventional enterprise data management.
But how do we make data accessible? The key is to build a reusable, resilient data foundation that can power not only well-understood, scoped projects and specific use cases, but also address unanticipated questions, and which can do both of these things by harnessing the power of everyone, i.e., by making insight generation a matter of democratized self-service.
It’s time to leverage the semantic layer to shift the approach from columns to concepts.
Moving from Columns to Concepts
The relational data model has long been the dominant data model for storing enterprise data, and it greatly constrained what data and analytics systems could represent and how the industry was built. Let’s refer to this conventional approach as the “columns” approach.
“Columns” here is shorthand for the fact that the relational model, while more abstract than previous data models, is still a pretty leaky abstraction. That is, the relational model is overwhelmingly concerned with the structural, or physical, representation of data: tables and columns and rows, and foreign keys between tables.
Let’s be clear: there’s nothing at all wrong with this. It’s important, foundational, even, to the modern world.
But there are increasing strains on the columns approach as the only approach to enterprise data management. What are these signs of strain?
One: A Weak Model of Reference
Foreign keys are the only explicit means of modeling relationships in the relational model (which, yes, is quite ironic!), and it’s a weak means, lacking a richness of representational, expressive power.
Two: Unavoidable Reliance on String Encodings
As a result of a weak reference model, string encodings are often used but these are ad hoc, not supported well or explicitly by the relational model, and offer poor interoperability, internally or externally. They also make data dependent on the code that parses string encodings into real data structures, which creates brittleness, tight coupling, and poor reusability. Then, when analysts and others pay attention to slices of data, aggregates, measures, and metrics, these may be unhelpfully decontextualized or abstracted away from their original business context, in which they carry much more meaning.
Three: An Insurmountable Problem of Cognitive Dissonance
In short, we store data relationally with columns, rows, tables, and keys, but this isn’t how most people think about the world. There are about 1 billion knowledge workers globally and the number of them that are able to understand complex business processes, relationships, and objects in a relational form is a rounding error. This is the fundamental impediment to data democratization: almost no one can understand the dominant storage representation of data.
Four: An Inability to Accommodate the Separation of Storage and Compute
The separation is the key architectural innovation of the cloud era. Relational analytics and integration platforms are inherently unable to truly separate storage from compute because the relational model is unavoidably physical, that is, tied to the storage of data.
Why does this matter? Because for true data democratization, if business logic is embedded in the storage layer, then it can’t be applied dynamically in the compute layer.
And that means ultimately that there can only be one view of what data means, namely, the view that’s encoded physically in the stored data itself. Everyone else is out of luck, so sorry! See the previous section to understand why you can’t build a democratized data platform to enable real self service if the underlying data is tied to one (among the many possible) points of view about what data means.
Countering the Strain
The upshot of all this strain is that a relational system is a poor basis upon which to build a truly democratized, self-service platform that enables easy access to source data to help make business decisions.
To counter these problems, organizations must adopt or shift to a connected data approach that is based on business meaning rather than storage location. With this approach, it’s possible to focus not on columns but rather on concepts, that is, knowledge of the business meaning of data, and that means encoded knowledge that’s closer to the view of the enterprise and view of business objects and relationships that people in the business have in their heads.
Users don’t need to look backward in a sliced, metrics sort of approach. They can eschew the rear-view for a now-view – a real snapshot in time of what the business is doing and what the leading indicators, metrics, and data in context are telling us about the business.
Let’s discuss how to accomplish this shift in approach.
What is a Semantic Layer?
A semantic layer represents, that is, it re-presents or presents again to a new audience a connected network of real-world entities — i.e., objects, events, situations, and concepts — independently of how the underlying data is stored.
One way to think about a semantic layer is as a query-answering service that uses a semantic graph data model to represent business meaning independently of where the data is stored, from data lakes to data warehouses to other data sources. In this sense, the semantic layer is a single data model of the business that moves us from columns to concepts.
The semantic data layer operates between the storage and consumption layers of the modern enterprise’s data analytics stack. The semantic layer provides the glue that connects all data and the boost to that data based on its business meaning, irrespective of its storage location. Whereas the storage layer is dominated by relational tables that only IT experts can understand, a semantic layer re-presents the business meaning of an enterprise in a form that citizen data scientists and business analysts can understand, interact with, and manipulate.
A Novel Semantic Layer Approach Brings Extraordinary Value
The idea of a semantic layer is not new. It has been around for more than 30 years, often promoted by BI vendors to help companies build purpose-built dashboards. But this initial approach to the semantic layer was often too rigid and complex.
A better approach to a semantic layer is to power it with an Enterprise Knowledge Graph Platform like Stardog, a powerful tool to shift from “columns to concepts.”
Knowledge graphs are built to represent real-world entities and their complex relationships to one another. A knowledge graph-powered semantic layer can represent multiple points of view simultaneously. The knowledge graph enables the modeling of complex relationships, even if the data is big, wide, siloed, and ever-changing.
The semantic layer not only describes people, places, things, and how they relate, it brings forth the possibility of self-service and data democratization. It’s a fantastic thing in enterprises when more people can contribute to the insight generation and when more people can self-serve to answer the questions they have about the business by interacting with the data directly in a format that is meaningful to them.
Once everyone is involved, enterprises can start doing things like cross-domain analysis, seeing across the business and across lines of business. In fact, enterprises can start to move towards a 360-degree view that includes not just product information, but all the domain and business objects that matter to the business in a holistic view.
And then insights are accelerated, because enterprises are oriented toward a predictive, forward-looking view, rather than always looking backward. Decision-makers can use data from any point in the value chain and then experience their benefits.
Forrester determined an ROI of 320% and total benefits of over $9.86 million over three years for the Stardog Enterprise Knowledge Graph Platform. Read this study to learn how several customers turned their data into knowledge, completed their data analytics projects faster, saved on infrastructure costs, and unlocked new business opportunities with Stardog.
The Vital Characteristics of a Transformative Semantic Layer
The semantic layer enables organizations to deploy analytics efficiently. The knowledge graph approach conveys access to remote data and provides a better way to enable self-service via a unified business layer. But how do you know your semantic layer is transformative and embracing the new approach? Here are four vital characteristics to look for in a solution.
One: Meaning abstracted from storage
The semantic layer abstracts meaning from storage. It lets the enterprise focus on what data means to the business, rather than how it’s stored in various systems and data silos.
Two: Data reuse via schema reuse
The semantic layer promotes the radical reuse of data by reusing schemas. The notion of builds starts to move towards a single, reusable model of the business. The data management benefits created include centralized management of invariants and policy, rather than a completely decentralized, many-mini-schemas approach that reinforces data silos.
Three: Virtualization and federation
A transformative semantic layer separates data storage and compute in terms of data integration because of its virtualization and data federation ability. Users can query the data where it lays and connect it on the fly, in a dynamic manner, rather than always having to wait for those ETL jobs to complete. Users can quickly consider what the data means and apply analytics to it.
Four: Logical and statistical inference
A transformative semantic layer also includes robust logical and statistical inferencing capabilities. If one is aggregating and collecting business meaning and context at the semantic layer, it’s important to apply inference and insight generation mechanisms at that layer, not merely at the storage or physical representation layer.
Semantic Layer Trends and Adoption
A semantic layer enabled by a knowledge graph can open up and accelerate business value. Moving from columns to concepts, without actually moving data, is a solution that’s easy to begin and sets up your organization with a powerful, yet flexible, data infrastructure.
Analysts and top tech providers agree.
- “By 2023, organizations that share ontology, semantics, governance, and stewardship processes to enable interenterprise data sharing will outperform those that don’t.” — Leverage Semantics to Drive Business Value From Data - Published 23 November 2021 - ID G00759088 - By Guido De Simoni, Robert Thanaraj, Gartner. Gartner® is a registered trademark and service mark ofGartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
- “By 2024, companies that use graphs and semantic approaches for natural language technology projects will have 75% less artificial intelligence technical debt than those that do not.” — How to Build Knowledge Graphs That Enable AI- Driven Enterprise Applications - Published 27 September 2022 - ID G00768041- By Afraz Jaffri
- “The flexible, composable and open nature of knowledge-graph-based data delivery eases the challenge of ensuring the semantic consistency of data across the enterprise. This allows business users, software engineers and data scientists to find, understand and use the data they need.” — How to Build Knowledge Graphs That Enable AI- Driven Enterprise Applications - Published 27 September 2022 - ID G00768041- By Afraz Jaffri
- “Data-driven enterprises are increasingly looking to build more context around their data and deliver a flexible semantic layer on top of their Databricks Lakehouse. Stardog’s Enterprise Knowledge Graph offers a rich semantic layer that complements and enriches a customer’s lakehouse and we are excited to partner with them to bring these capabilities to Databricks Partner Connect.” — Roger Murff, VP of Technology Partners at Databricks.
Try it Yourself
Are you interested in building a semantic layer? We encourage you to try Stardog, for free, to see how easy it is. Visit our “Get Started” page to begin via Cloud or download your copy. If you are a Databricks user, you can simply start there.