Stardog & MongoDB

Jess Balint

Jun 19, 2018, 4 minute read

JSON silos are a liability, too, so as a first step we’re adding support for virtual graphs over MongoDB.

Integrating JSON Data

Data source heterogeneity continues to increase and includes non-relational data models. Stardog presents a unified view over all sources of data irrespective of their native data model, easing the pain of unifying and querying the data. The rise of MongoDB and subsequent push for JSON feature parity from relational databases demonstrates the prevalance of storing data as JSON. We recognize the demand and we’re building support for virtual graphs over MongoDB.

If you’ve worked with virtual graphs in Stardog, then you know how easy it is to map a graph view of a relational database using Stardog Mapping Syntax. We’ve taken SMS a step further for creating RDF graph mappings of JSON document collections such as those managed by MongoDB.

Mappings

To extend SMS to JSON, we merely need to convey the structure of the underlying data and how it maps to RDF. There’s no jargon or new vocabulary full of obtuse concepts. First, let’s look at how to specify the structure of the source data. Since it’s JSON, we use a JSON template which clearly reflects the structure of the document:

  "movies":{
    "_id": "?movieId",
    "title": "?title",
    "plot": "?plot",
    "cast": [
       { "id": "?actorId", "name": "?actorName" }
    ],
    "genres": [ "?genre" ]
  }

We use the template to bind variables based on the structure of the JSON document. The root key here (“movies” in this example) specifies the collection name in MongoDB that we’re querying over. For elements in arrays, such as actor objects in the cast array, we think of it as if there’s one set of bindings for each element in the array.

In order to map the data to RDF, we simply state a set of triple patterns representing the RDF structure using the variables bound in the JSON template:

  ?movie a :Movie ;
    rdfs:label ?title ;
    :title ?title ;
    :description ?plot ;
    :genre ?genre .

  ?actor a :Actor ;
    :starredIn ?movie ;
    :actorName ?actorName .

At this point, we load the mappings and the virtual graph is ready for querying or materialization. Here’s the entire mappings file for reference:

PREFIX tt: <http://www.imdb.com/title/>
PREFIX nm: <http://www.imdb.com/name/>
PREFIX : <http://example.com/>

CONSTRUCT {
  ?movie a :Movie ;
    rdfs:label ?title ;
    :title ?title ;
    :description ?plot ;
    :genre ?genre .

  ?actor a :Actor ;
    :starredIn ?movie ;
    :actorName ?actorName .
}
FROM JSON {
  "movies":{
    "_id": "?movieId",
    "title": "?title",
    "plot": "?plot",
    "cast": [
       { "id": "?actorId", "name": "?actorName" }
    ],
    "genres": [ "?genre" ]
  }
}
WHERE {
  BIND (template("http://www.imdb.com/title/{movieId}") AS ?movie)
  BIND (template("http://www.imdb.com/name/{actorId}") AS ?actor)
}

Queries

Virtual graphs in Stardog can be used just like physical graphs and are compatible with features such as machine learning, reasoning, named graph security, and path queries. Once we’ve loaded our movies virtual graph, we can query it in the normal way:

select ?movie {
  graph <virtual://movies> {
    ?actor a :Actor ;
      :actorName "George Clooney" ;
      :starredIn ?movie
  }
  ?movie :boxOfficeSales ?boxOfficeSales
  FILTER(?boxOfficeSales > 10*1000*1000)
}

Here we’re querying the movies data source for movies starring George Clooney. We join that with some data stored in Stardog to restrict those movies to ones with more than 10 million in box office sales.

Higher Level Views of Data

Mapping a single MongoDB data source to RDF is pretty useful. We can express a much wider range of queries than in MongoDB directly. However, the real power comes in building higher level views of data. Using Stardog’s reasoning capabilities, it’s possible to define abstract relationships between properties and classes.

For instance, we might want to build a new relationship between all actors that starred in the same movie together. Using something as simple as this, we can express queries such as “Six Degrees of Kevin Bacon”. See the Stardog docs about path queries for more.

Using Stardog’s machine learning capabilities, we can use data stored in MongoDB directly as input to model training or combine it with another data source. In the example query, we combined the movies data stored in MongoDB with box office sales data stored in Stardog. We can use this query as input to train a model which would predict box office sales given the set of actors in a potential new movie.

Ultimately combining data in this way gives us a unparalleled view by connecting isolated sources. In Stardog combining any number of data sources can be done in a single SPARQL query. This means any combination of virtual graphs over relational databases, MongoDB, other Stardog instances, etc.

Coming Soon

Ready to map your MongoDB databases into the knowledge graph? We’re putting the finishing touches on the new feature and are busy with QA. If you’re interested in beta access, please let us know.

Read more on why data silos are a huge liability here.