Learning to Predict
Get the latest in your inbox
Get the latest in your inbox
We’re adding machine learning to Stardog to make your Knowledge Graph even smarter.
The thing about an enterprise knowledge graph is that it should know stuff. And knowing stuff is a lot like learning stuff. People who know and learn stuff are good. Machines that know and learn stuff aren’t better, but they’re tireless and never take holiday.
So we’re adding machine learning to Stardog. That is, we’re adding statistical inference to the logical inference Stardog already performs.
Our initial focus is predictive analytics: the ability to predict nodes and edges in a knowledge graph. So you’ll be able to use Stardog to extract patterns from your data and make intelligent predictions based on those patterns.
In this blog post we give an overview of how to use the upcoming predictive capabilities of Stardog. We’ll describe how to build a model and how to use it for prediction.
Before Stardog can perform predictions, we need to define what are we actually trying to predict. This task is called “model training”. You provide data and a target, and Stardog learns a model that can be used to predict the value of the target given some other, probably unseen, data. Let’s look at an example in which we train a model to predict the genre of a film given its director, studio, and year.
prefix spa: <tag:stardog:api:analytics:>
INSERT {
graph spa:model {
:myModel a spa:ClassificationModel ;
spa:arguments (?director ?year ?studio) ;
spa:predict ?genre .
}
}
WHERE {
?movie :directedBy ?director ;
:year ?year ;
:studio ?studio ;
:genre ?genre .
}
Training a model can be naturally expressed in SPARQL using an INSERT
. The
WHERE
clause selects the data we are interested in, and a special graph,
spa:model
, is used to specify the parameters of the training.
In this example, we are training a new model, :myModel
, which will be used
for classification.
We currently support binary classification, multi-class classification, and
regression. Our model will be trained to predict the value of ?genre
based on
the values of ?director
, ?year
, and ?studio
.
Now that we have trained a model, we are ready to use it for prediction as part of query answering.
prefix spa: <tag:stardog:api:analytics:>
SELECT * WHERE {
graph spa:model {
:myModel spa:arguments (?director ?year ?studio) ;
spa:predict ?predictedGenre .
}
:TheGodfather :directedBy ?director ;
:year ?year ;
:studio ?studio ;
:genre ?originalGenre .
}
We select a movie’s properties and use them as arguments to the model Stardog
previously learned. The magic comes with the ?predictedGenre
variable; its
value is not going to come from the data itself, but will instead be predicted
by the model, based on the values of the arguments. But query answering proceeds
as if the predicted value were present in the graph.
The result of the query will look like this:
| director | year | studio | originalGenre | predictedGenre |
| ------------------- | ---- | ------------------ | ------------- | -------------- |
| :FrancisFordCoppola | 1972 | :ParamountPictures | Drama | Drama |
If the predicted values are constantly the same as the original ones, you’re on the right path for having good predictions. Your model has a high accuracy, and you can now focus on making its results more general.
A SELECT
returns tabular data, which limits the expressivity of model training
with high-relational data. We
are
extending SPARQL solutions
to include arrays, which can very nicely represent multi-valued arguments, such
as all actors in a movie:
prefix spa: <tag:stardog:api:analytics:>
INSERT {
graph spa:model {
:myNewModel a spa:ClassificationModel ;
spa:arguments (?director ?actors) ;
spa:predict ?genre .
}
}
WHERE {
select ?director ?genre (array(?actor) as ?actors) {
?movie :directedBy ?director ;
:actors ?actor ;
:genre ?genre .
} group by ?director ?genre
}
We are expecting the first release of predictive analytics to be available this summer. Right now we are working with Vowpal Wabbit, an extremely efficient and scalable machine learning library, so expect more features to be available at release time. We’re also looking at link mining, inductive logic programming, similarity learning, and natural language processing, too.
Knowledge graphs know stuff; your enterprise knowledge graph should know stuff about your enterprise. And it should learn more stuff from the stuff that it already knows. That’s where we’re headed. Stay tuned!
Read our whitepaper to learn more about our Machine Learning capabilities.
How to Overcome a Major Enterprise Liability and Unleash Massive Potential
Download for free