Knowledge Graphs and Machine Learning
Get the latest in your inbox
Get the latest in your inbox
According to the 2020 AI in Organizations Survey*, 23% of organizations deployed graph techniques in their artificial intelligence (AI) projects. Large technology companies, public-sector organizations, financial solutions providers, and healthcare led the way, using graph technologies to enhance data search, information retrieval, and recommendations. But now, all industries and organizations can combine knowledge graphs with machine learning by using platforms that are easy to adopt and scale, making machine learning more commonplace and more successful.
Knowledge graphs connect and contextualize disparate data, organized and represented in graph databases. Built to capture the ever-changing nature of knowledge, knowledge graphs easily accept new data, datasets, definitions, and requirements.
How do they do such wonders? Knowledge graphs were born from the semantic web, which was Tim Berners-Lee’s attempt to discern meaningful relationships (using RDF metadata models, going beyond the simple link) between information on web pages. The semantic web vision has not materialized but the underlying technologies have proven very successful to connect silos of enterprise data in ways that give context to the data. Semantics is about encoding meaning along with data.
So, a knowledge graph is a semantic data layer. And in that layer, knowledge workers (often assisted by inferencing) describe how all the data that the organization accesses is classified and related. A knowledge graph describes the meaning of all these business objects by networking them and by adding taxonomies and ontological knowledge that provides context. This data layer provides a secure access point that is standards-based and machine-processable.
Graph databases are built for storage. Graph structure alone, without the inferencing, virtualization, and agile development available in enterprise knowledge graph platforms, would require immense work to scale to the enterprise level. With a knowledge graph, data scientists can work with knowledge engineers, together with business users and information technology teams, to turn data into actionable insights. As large-scale access to data continues to grow, the enterprise will need to work together. Graph machine learning is a powerful tool to help.
Machine learning (ML) is when machines learn from data and self-improve. In 1952, Arthur Samuel created a program to help an IBM computer get better at checkers the more it plays, so ML algorithms have been around for over 70 years. ML is commonplace for recommendations, predictions, and looking up information.
ML is a form of artificial intelligence, which is a wide area of computer science focused on building smart machines that require human intelligence. AI strategy generally focuses on ensuring data is accessible, reusable, interpretable, and high quality, which is often a challenge with existing data infrastructure.
But things have changed since the 1950s. Approaching ML and AI is not straightforward. There are two branches of AI occurring within enterprises today. Most organizations historically incorporated statistical learning through data science projects. But rule-based AI is growing, and this approach includes everything from making intelligent inferences about schemas to expedite data integration to assembling techniques for text analytics or Natural Language Processing (NLP). Our CEO noted to AnalyticsWeek:
Instead of these different branches of AI competing with each other in vendor solutions, the industry has reached a point of inflection in which there are more offerings “doing new school AI, i.e., statistical learning, machine learning, what we call machine learning and also, at the same time, and we’ve worked on this, so it all works together seamlessly, they’re also doing that symbolic or rules-based AI,” Stardog CEO Kendall Clark commented…. “It looks more and more like the future of AI will be some combination of both logical and statistical.”
A knowledge graph is not inherently a part of ML, but it can help you a lot. The best data science projects come from combining more than one source of data, and that can be a nightmare for data scientists. When it comes to combining data sources and datasets, ontologies and context help. This means platforms like Stardog assist tremendously.
There are many different tools to choose from, concerning knowledge representation. And with so many use cases and dependencies, data points and data sources, success depends on what you look to accomplish. For example, are you running deep learning to classify data to turn that into a knowledge graph? Building a recommender system? Or exploring neural networks? Are you creating a chatbot or creating a “Wikipedia” or search engine for your knowledge base?
From an implementation perspective, the many possible paths can feel like a barrier to entry for organizations that just want to get started. It’s a good thing that knowledge graphs help a range of AI/ML approaches.
Knowledge graphs make it easier to feed better and richer data into ML algorithms. They do this by helping you leverage industry-standard models and ontologies, model your domain knowledge, and connect disparate data sources across the enterprise. You can maximize the use and reuse of your internal content by laying the foundation for AI and semantic applications. Ultimately, connect and show meaningful relationships between your data, regardless of storage area, size, type, and format.
Secondary benefits include:
The inherent traits of knowledge graphs posit them as a top tool of modern AI and ML strategy. Let’s examine a few ways in which they help.
How do data scientists and machine learning engineers spend their time? A significant portion of their time is spent data wrangling (also known as data munging). Data wrangling is the process of manual data gathering and cleansing before using it. It includes things like trying to find the right data source or getting their extracts to build their matrices to feed their algorithms. In real-world situations, this type of work can often eat up 70-80% of the time it takes to produce the desired model or expected results.
With a knowledge graph, data scientists can train models directly on unified data—with harmonized terminology and synthesized data sources—instead of on incomplete, out-of-date, or inaccurate data. So even at the most basic level, pulling the data from a platform like Stardog is a huge time saver.
Now let’s consider the rules-based approach. Stardog has best-in-class support for something called inferencing. Stardog’s Inference Engine allows you to resolve conflicting data definitions without changing or copying the underlying data. Capture your business and domain rules in the data model; the Inference Engine intelligently applies these rules at query time. This easily solves a common issue—what one database calls a “Major Account” another calls an“EnterpriseCustomer.” To fix, write a rule that states both are subclasses of “Top Accounts” and query the full range of details on your connected data. The Inference Engine displays all logic for each result, making explainable AI a reality.
So again, if your objective is to pull out a well-known dataset to train data on, or it’s going to be your validation dataset, then a platform like Stardog will make those efficient and highly productive activities.
Knowledge graphs that contain virtualization (not just graph databases acting as a knowledge graph) work well to maintain data accuracy and the security of existing tools. This includes incumbent toolchains and frameworks that are already deployed. A knowledge graph that is a true data layer does not require you to change anything you’re doing today.
Our platform reinforces the productivity boost because the tools match those user bases very well and comprise things like Python support, R libraries, etc. Additionally, the output of your models can also be put back into the knowledge graph.
Because a knowledge graph is also a semantic layer, it enables reuse and interoperability. This creates an enterprise-wide asset that does not require reinvention for each individual application or use case. You can solve infrastructure challenges without having to redo or lose your existing systems and without having to build everything from scratch each time.
You need a high-performance tool that gives you fast access to the maximum amount of data regardless of where it’s stored. Stardog’s platform provides a great way to access data—via virtualization, which leaves data where it is. This helps insulate you from changes in your source data. If data changes, you can still quickly retrain and redeploy the models. When you consider how to improve model quality, training against raw data is now a cost-effective and scalable solution.
Our platform also ships with built-in predictive analytics and similarity search, supporting quick model development and iteration for data analysis.
It’s helpful to have the ability to predict nodes and edges in a knowledge graph. You can use Stardog to extract patterns from your data and make intelligent predictions based on those patterns. Use ML to predict the value of a relationship or combine with Pathfinder to solve tricky operational problems like determining the best alternative supply routes when a depot goes down.
Similarity search is also an important functionality when coupled with the connected data of a knowledge graph, as you can use it to detect and recommend patterns. For example, use similarity search to recommend relevant articles based on user inputs, fill in gaps in data lineage, or find new chemical compounds similar to a known compound.
Stardog provides an embedded ML capability that interfaces with Vowpal Wabbit (a solution for reinforcement learning, supervised learning, and other ML paradigms, developed by Microsoft Research). Many of the parameters of that library are elevated into the Stardog query. So, you can train and run a model in a simple query.
And let’s touch on productivity again. In a knowledge graph, the data comes out as graph data, and so it’s already set up for a nice reinforcement algorithm to feed that data back into the graph. Put it in a place where you can do additional queries. You can check to see if it’s valid or not, and that all becomes a more considerable productivity boost for the entire workstream.
In summary, if you want to predict something, classify something, or see if things are similar, Stardog can help you get started quickly.
ML complements logical reasoning, which together provide a suite of reasoning capabilities that brings forth the total value of your connected data.
Traditional AI research has trended towards inferencing to capture decision-making and knowledge representation. From that heritage is born things like the inference engine in Stardog, which is best-in-class. As part of an overall strategy, adding in the inferencing capability to your tool suite, whether you use our particular ML libraries or you’re doing that in concert with additional third-party libraries, you’ve added a capability that few others have. And that’s the ability to capture these facts and infer new facts.
Inference expresses all the implied and predicated relationships and connections between your data sources, creating a richer, more accurate view of your data. Better data means better learning. And better means providing context, not just volume. What’s needed is AI that can learn more quickly and produce answers to questions.
Additionally, constraints ensure accurate, valid data. Use constraints to prevent the knowledge graph from accessing bad data or to simply flag inconsistencies in the data.
Machine and deep learning systems are increasingly used to make decisions. But there are limitations. Even if proved to be highly accurate decision-makers, these systems cannot explain their decisions in a way that people understand. There’s no explainability. For many users, this lack of “why” makes the decisions untrustworthy.
Given that knowledge graphs provide context and domain information in a machine-readable format, you can integrate them with explainable ML approaches to provide more trustworthy explanations.
To learn more about Stardog and machine learning, check out our whitepaper, “Machine Learning: Shifts in Business.”
*As cited in Gartner’s “How to Build Knowledge Graphs That Enable AI-Driven Enterprise Applications: May 27, 2020
Update on Knowledge Graphs and LLM
Stardog Voicebox is coming soon! Voicebox is a knowledge engineer powered by Large Language Models (LLM), other Generative AI, and autonomous agents to provide 3 core services.
How to Overcome a Major Enterprise Liability and Unleash Massive Potential
Download for free