This is the first part of the Getting Started series, which puts Knowledge Graph concepts in-action and introduces the SPARQL query language. It was designed for technologists of all backgrounds and assumes no knowledge of Stardog, though it does presume some basic familiarity with query languages and relational databases.
Part 0 explains foundational concepts necessary to start interacting with Stardog in Getting Started: Part 1 and onward.
0.1: What is a knowledge graph?
Knowledge is messy – any given concept can mean different things to different people, carry layers of associations, and be connected to a multitude of other concepts. Given the complexities of knowledge, capturing it in a machine-readable format can be nearly impossible without the right tool. Knowledge graphs are purpose-built to achieve this goal; Stardog is based on the RDF open standard which was created to represent large-scale information systems.
We define knowledge graph as a representation of data that is enriched with real-world context, is based on the graph data structure and has a flexible schema that allows for multiple definitions of the same data. Read on for definitions of these key concepts, or if you are familiar with knowledge graphs, you can go straight to Getting Started: Part 1.
The graph data structure
To define graph, let’s start by comparing it to a more familiar concept — relational systems, like tables. Relational databases are optimized for efficient storage and retrieval of transactional data. Relational data has fixed data definitions determined by the column headers. Those definitions in the column headers help make up the database’s schema. A schema is the set of rules that govern a database, effectively stating what data can or can’t enter the system.
In contrast, the graph data structure is designed to highlight relationships between concepts. In graph, data is expressed as a triple – two “nodes” connected by an “edge.” For example, the sentence “The Beatles sing the song Yesterday” would be expressed as a triple with two nodes, “The Beatles” and “Yesterday”, connected by an edge that with type “sings”. Or we could express the sentence “Yesterday was released in 1965” with nodes “Yesterday” and “1965” connected by a “releaseYear” edge.
Graph easily accepts new information about nodes, simply creating new edges to relate additional data to the existing data. In comparison, in a relational system, adding a type of data that is not already accounted for in the schema requires creating a new schema and a new combined dataset. The graph schema’s ability to accept new information makes it ideal for projects with agile release cycles or new incoming data streams.
Graph vs Knowledge Graph
Graph has been popularized through graph databases, which support applications with changing or highly interconnected data. Where knowledge graphs differ is that they also support many layers of associations or conflicting definitions of the same data. Simply put, a graph database is still only designed to support one point of view, whereas the knowledge graph’s schema supports multiple points of view.
As we stated above, a knowledge graph is a representation of data that is enriched with real-world context, is based on the graph data structure and has a flexible schema that allows for multiple definitions of the same data. As a result, knowledge graphs can easily support projects where multiple departments are collaborating or where requirements are changing.
A knowledge graph is only as powerful as the data it can access, but accessing data in real-world IT environments can be challenging. Stardog’s platform provides tools to address these challenges:
- Virtualization connects data remotely, whether data lives in the cloud or on-prem
- Connectors to SQL and NoSQL databases map data from common data sources
- BITES, our NLP engine, extracts entities and concepts from unstructured data like research papers, resumes, and regulatory documents
The value in unifying data is the trusted insights interpreted from the data. Stardog offers a suite of tools to help reveal new insights and to ensure that results are actionable:
- Built-in machine learning improves model quality by allows data scientists to train models against complete, up-to-date data
- Pathfinder operates path queries to trace data lineage and find distant, indirect links across data sources
- Data Quality Constraints help ensure data is correct by finding, flagging, and/or prevernet conflicting data
- The Inference Engine displays all logic for each result, providing details required to operationalize insight
- Ready to dive in? To continue on with the Getting Started series, head on over to Getting Started: Part 1.
- Want to learn more about our data model first? Read up on the RDF open standard that Stardog is based on here.