Download Stardog now to start a free 30-day evaluation.

Boosting Machine Learning

by Pedro Oliveira, 6 February 2018 · 2 minute read

Stardog Enterprise Knowledge Graph expands its machine learning capabilities by integrating XGBoost, the high-performance gradient boosting library.

Now your Knowledge Graphs can be seamlessly enhanced by state of the art gradient boosting machine learning models.

Gradient Boosting

If you have been paying attention to Kaggle competitions, then you’re probably aware of the existence of this machine learning library that gives state of the art results for almost any dataset, XGBoost. It’s consistently part of the toolset in Kaggle winning solutions.

We’ve been closely following XGBoost’s development since we first released our predictive analytics capability. Since then XGBoost has achieved a maturity point at which we decided to incorporate it as one of our underlying machine learning libraries, just like Vowpal Wabbit.

XGBoost in Stardog

All the features already available in the predictive analytics module, such as classification, regression, and confidence levels, are also supported by XGBoost.

All you need to do is set the correct library at model creation time:

prefix spa: <tag:stardog:api:analytics:>
    
INSERT {
   graph spa:model {
       :myModel  a spa:ClassificationModel ;
                 spa:library spa:XGBoost ;
                 spa:arguments (?director ?year ?studio) ;
                 spa:predict ?genre .
      }
    }
WHERE {
   ?movie :directedBy ?director ;
          :year ?year ;
          :studio ?studio ;
          :genre ?genre .
}

This will create an XGBoost classification model with the given data and default parameters. All the parameters can be configured through the spa:parameters property:

INSERT {
   graph spa:model {
       :myModel  a spa:ClassificationModel ;
                 spa:library spa:XGBoost ;
                 spa:parameters [
                   spa:max_depth 10 ;
                   spa:tree_method 'approx' ;
                   spa:updater 'distcol,prune'
                 ] ;
                 spa:arguments (?director ?year ?studio) ;
                 spa:predict ?genre .
      }
    }
...

XGBoost VS Vowpal Wabbit

For now Vowpal Wabbit is still the default library used by Stardog when learning a model. However, out of the box, XGBoost usually performs better than Vowpal Wabbit, especially in domains with a smaller number of distinct classes or domains composed mostly of numeric features.

The biggest difference between them is the underlying methods used to deal with data when learning a model. XGBoost holds most of the data in memory and is usually slower and more resource intensive, while Vowpal Wabbit is backed by an extremely fast online learning algorithm, which just needs to hold one result at a time in memory. Knowledge graphs are typically very large and sparse, which makes Vowpal Wabbit our preferred solution for most use cases.

We recommend giving both a try to see what works best in your domain.


comments powered by Disqus