Download Stardog now to start a free 30-day evaluation.

NASA's Knowledge Graph

by Al Baker, 15 August 2017 · 9 minute read

A story about how NASA built a Spacefaring Knowledge Graph with Stardog.

Connectedness in Spacefaring

NASA’s Mission to Mars requires dealing with many large, unconnected, that is, siloed data sets. Despite the silos, everything really is connected to everything else. Everything matters in spacefaring.

Stardog’s Knowledge Graph platform helps NASA manage the relationships that exist not just in the data sources but also among and between them.

The main requirement is to see how and why any two bits of data relate to each other and to discover cross-system relationships that aren’t obvious or explicit in the source systems. This is the main requirement because complex systems like spacefaring craft are highly interconnected networks of potentially devastating cascading failures.

In spacefaring, everything really is connected to everything else.

Example: SLS Booster Ignition

Space Launch System (SLS) will be NASA’s most powerful rocket. It’s a very complex component of a very complex system for human exploration of the solar system. Let’s consider a specific example: SLS booster ignition to the mission and fault models and confirming autonomous operation test results.

Feel the interdisciplinary IT challenge burn, people…

Actual rocket scientists build Matlab models of the firing sequence for how the rocket lifts off, clears the tower, etc. Then those models have to be combined with the tests for the actual flight computer’s software, which will execute the commands to fire the rockets, throttle up, etc.

We model the physics of the beast and then we test the software that flies the beast and then we want the models and the tests—along with a lot of other stuff—to be managed in relationship and in context. Everything (physics models of NASA’s largest rocket ever) is connected to everything else (tests of the software that flies the great beast!).

SLS Core Storage 101 (Click for larger version)

SLS Core Storage 101 (Click for larger version)

SLS flight models know things like mass of the rocket, mass of the payloads, etc. That model data is needed by the flight computer to know how much thrust is needed to lift off. So you want to track the completion of both pieces and make sure the flight software team is using the right model, versions match up, and the tests and reviews on both sides get to everyone who has an interest on both teams.

NASA requires provably correct traceability across many unconnected data sets to decide whether its software will launch and fly the rocket like the math says it needs to fly.

So it’s only that problem times by, like, a factor of 100. This is a highly connected, very dense graph of dependencies, relationships, and connections.

The Spacefaring Knowledge Graph

That’s systems engineering. Let’s generalize this example. What we get is the systems engineering requirement of traceability. We need to be able to trace the relationships between all the things!

Whether we are tracing a lineage of documents, requirements, test results, or operations execution, we want to know what the relationship is and something about that relationship. In other words, there’s a model in the graph that helps us solidify the graph as a source of truth for the engineering community.

By using a Knowledge Graph to exploit these cross-system relationships, systems engineers can assemble a coherent story of the total system design powered by real-time status from authoritative systems.

Queries of the Knowledge Graph save countless hours of manual report manipulation, such as finding the relationship of math models, to requirements, to functions of the launch vehicle. Empowering engineers with actionable insight not only saves time and effort, but it can save lives in safety critical systems as downstream dependencies are made visible for review and analysis.

Our Approach To Modeling

So how did we build this thing with the smart folks at NASA as partners and customers? The key takeaway here is that a Knowledge Graph platform is a Knowledge Toolkit plus a Graph Database, and all of those components are critical at NASA.

Doing this with a plain graph database isn’t going to work unless you want to do all the heavy lifting of AI, knowledge representation, machine learning, and automated reasoning yourself, from scratch. I’ll wait while you decide…didn’t think so.

Schema Alignment

So in practice implementing a Knowledge Graph over, above, and between data silos means building a data model that “covers” the (relevant parts of the schemas of the) silos that you care about.

Each silo has a schema. To connect that data, we built a new schema by aligning each silo’s schema—or, often, just some relevant parts of the schema—to a shared logical model. Schema alignment is important because we don’t want to deal with the specific gotchas of any source silo feeding the graph, and we want to generate logically sound and correct answers. Spacefaring is not a place for sloppiness or avoidable uncertainty.

Stardog’s Knowledge Toolkit is full of bits that make schema alignment practical and fast. Instead of building warehouses and specialized data marts, or performing complex transformations on ETL jobs, we gave Stardog some declarative descriptions about how each silo relates to the whole. This method of schema alignment is independent of how (and when) a silo’s data gets into the graph because we can model independently of the mechanics of loading data in the graph.

Schema alignment follows a repeatable recipe for representing silos in the Knowledge Graph by building the alignment incrementally, largely via Stardog Rules (i.e., think of these as basically Datalog, and you will be correct enough).

  1. Represent the silo’s definition of its data in the graph, so you can easily compare and verify data pedigree. This may include only looking at some parts or facets of the silo, which may include stuff that isn’t relevant in the larger context. This often includes providing a unique namespace in the Knowledge Graph for ease of management down the road.
  2. For each object represented in the graph, we provide a parent class in the logical model, which can include disjointness—nothing can be an instance of both X and Y—and other rules on how key data objects relate.
  3. Use Stardog Rules to express the relationships between things, and avoid using physical data layouts of the silos in the Knowledge Graph (e.g. left over many-to-many tables).
  4. Use Stardog to check for logical consistency and debug the logical model as needed.
  5. Finally, layer on apps, queries, and analytics that use the new model, which decouples data from domain logic elegantly; and then use Stardog’s integrity constraint validation to declaratively sanity check everything.

These guidelines provide a way for us to manage the Knowledge Graph and silos independently. We can then do DRY schema maintenance for all the silos—keep the source systems intact, and deal with differences in one place.

Rules and Graphs: A Mutual Love Affair

Plain graphs support graph traversals in which code and graph structure are tightly coupled. When graph structure changes, traversal code tends to break. That’s messy and expensive.

The better way is modeling domain knowledge declaratively. This loosely couples domain logic and graph such that when graph structure changes things tend not to break. That’s elegant and efficient.

That’s why adding a Knowledge Toolkit to a Graph Database means winning all day, every day.

Examples of Winning

Let’s take a look at a few examples on how to apply the above guidelines. We’ll focus on using Stardog Rules to build the logical model across silos. I like using declarative graph rules as it lets us iteratively build up a graph and experiment with relationships that span across silos.

First, a simple rule: if a node is a Requirement and has a linkedModel edge, the other node must be a Model.

IF {
       ?req a :Requirement ;
     		  :linkedModel ?model .
} 
THEN { ?model a :Model }

This relationship could be expressed more concisely by saying that linkedModel edge type has a domain of :Requirement and a range of :Model.

We can easily define the inverse separately by just saying that it’s inverseOf:

:linkedModelOf a owl:ObjectProperty ;
        owl:inverseOf :linkedModel .

Now we can traverse that edge in either direction and we know the type of node on each end. Let’s deal with some intermediate nodes. We often see this pattern repeated when many-to-many tables from an RDBMS are represented in the knowledge graph.

IF {
       ?req a :Requirement .
       ?link :sourceURI ?req .
       ?link :targetURI ?target .
       ?target a :Requirement .
}
THEN { ?req :hasRqmt ?target  }

In this example, we have a silo that has two requirements, with an intermediate link relationship for creating the linkage between requirements. Again, we want a high-level direct edge, which is only virtually in the graph, instead of having to physically traverse the intermediate structure. So we introduce a new edge logical:hasRqmt and define its conditions. In other words, this edge exists when such and such conditions are present.

The logical predicates can be combined using property paths, e.g. to find indirect requirements:

IF {
       ?r :hasRqmt/:hasRqmt ?r2 .
}
THEN { ?r :indirectRqmt ?r2  }

We can also move literals around, that is, promote some values to some other node for convenience or clarity. For example,

IF {
       ?req a :Requirement .
       ?req :hasTest ?test .
       ?test a :Test .
       ?test :hasResult ?result . 
       ?result :Status ?status .
       ?test :hasCompliance ?result .
}
THEN { ?req :hasComplianceStatus ?status . }

Let’s put this all together with a SPARQL query where we know the model, and want to find the downstream (i.e. indirect) requirements and their compliance status from testing:

select ?req2 ?status where {
        :model1 :linkedModelOf ?req .
	?req :indirectRqmt ?req2 .
	?req2 :hasComplianceStatus ?status .
}

Over time, some of these rules may be combined, refined into even more finely grained relationship, or even refactored into integrity constraints if we need to enforce some relationships as invariant.

Because Stardog performs reasoning at query time, we can also make changes to the logical model without necessarily having to impact the data, regenerate an ETL script, or deal with a multi-system upgrade.

Summary

In this post, we looked at some guidelines for unifying data silos at NASA into a Stardog Knowledge Graph. We looked at using Stardog Rules to make that quick and easy. We’ve only scratched the surface. As you build your knowledge graphs, you can take advantage of sameAs reasoning, virtual graphs, and ICV along with all the other Stardog features

With all these tools available, understanding data has never been easier. We’ve successfully applied the concepts in this blog to NASA to reduce manual work and unlock new queries and analytics, saving time and cost on the Mission to Mars.


comments powered by Disqus