Working with Datadog, we can improve and troubleshoot all aspects of Stardog performance.
Monitoring provides insight into current and historical performance. When operating any service it is crucially important to gather internal metrics on its resource consumption. This is certainly true for a knowledge graph. Having a real-time look at memory usage, CPU load, and network and disk activity allows the operator to reason about performance issues and hardware requirements. Further it gives the Stardog development team an incredibly handy tool to debug race conditions and find performance bottlenecks.
We’ve recently contributed a plugin to the Datadog agent allowing data collection about a running Stardog instance. The resulting metrics are stored in Datadog and can be published to a monitoring dashboard.
Datadog is a powerful service for viewing metrics and events. It has many collectors for various services like AWS, Docker, and Cassandra. Now Stardog is one of them.
Here’s a typical dashboard for a small Stardog cluster:
In the dashboard above we are able to visualize a set of time series graphs showing the operation of a Stardog cluster. The graphs are synchronized allowing us to correlate information and reason about possible events. For example, if we see a spike in CPU usage followed by a sharp reduction in memory usage we might assume that a garbage collection event took place. As we start looking at interactions between Stardog cluster and ZooKeeper nodes, events get quite complicated. This type of dashboard provides a lot of information and we can easily see changes in metrics.
Datadog is a SaaS offering, so we log into a web console. From that console we can build dashboards. Each dashboard can be configured with a set of time series graphs, gauges, event time lines, alerts, and many other things. However these graphs are only as useful as the data that they display. In order to gather data, we must put collectors in place. Some collectors, like Amazon Web Services, are configured by giving Datadog authorization to poll their services. In contrast, Stardog requires an agent to be deployed on host in order to gather data.
The default configuration of an agent will give you memory, CPU, IO, and other system standard metrics. While this information is generally very useful, we can also get Stardog specific information using the extensions package.
The Datadog agent has an community-managed open source extensions repository. Contributions are submitted as pull requests by vendors interested in integrating their products with Datadog. The Datadog team reviews and merges these contributions into their releases if they think they will be valuable.
The best way to inject Stardog metrics into Datadog is by installing the Datadog agent on the same system as the Stardog server (or in the case of a cluster deployment on each node of the cluster). The agent will then poll each Stardog server for metric information via its REST API. By default the polling interval is 15 seconds. After the agent collects metrics, it forwards them up to Datadog SaaS collectors where they are made available to dashboards.
The Datadog agent can be installed in a variety of ways including system
package managers like
apt. It can also be done with a custom
script provided to you by Datadog when your account is created.
Installation instructions can be found here.
Once the base package is installed, the extensions package needs to be installed along side if it. In the near future Datadog intends to provide system packages for installing the extra integrations but at the present time the best way to do this is to pull the latest version out of the GitHub and manually copy the needed files for each specific configuration. The following commands show how this is done for Stardog on a Linux system:
git clone https://github.com/stardog-union/integrations-extras.git cp integrations-extras/stardog/check.py /etc/dd-agent/checks.d/stardog.py
Once the collector script is in place it must be configured. This is
done via the YAML file
/etc/dd-agent/conf.d/stardog.yaml. A sample
init_config: instances: - stardog_url: http://localhost:5820 username: admin password: admin tags: - backpressure - stardog-node-1 - internal-testing - stardog
The first section under
instances in the above example tells the Datadog agent
where Stardog is and what credentials are needed to access it.
admin access is
tags section can be anything. These are just a list of strings
that are sent along with the metrics as metadata. We will use those later to
filter out results in a dashboard.
The final step here is to start the Datadog agent:
The log for the Datadog agent can be found at
Once it is running with the Stardog plugin lines like the following should
2018-02-09 17:49:06 UTC | INFO | dd.collector | daemon(daemon.py:234) | Starting 2018-02-09 17:49:06 UTC | INFO | dd.collector | config(config.py:1243) | initialized checks.d checks: ['stardog', 'disk', 'network', 'ntp'] 2018-02-09 17:49:06 UTC | INFO | dd.collector | config(config.py:1244) | initialization failed checks.d checks:  2018-02-09 17:49:10 UTC | INFO | dd.collector | checks.collector(collector.py:404) | Running check stardog
Now that an agent is running and publishing metrics to Datadog we can login
and create a dashboard to visualize our data. Login to your Datadog
On the left hand side find the
New Dashboard as shown below:
In the pop-up give your new dashboard a name and select
Next drag a
timeseries graph from the tool bar onto the dashboard.
At this point you can select the metrics that you would like to visualize
by filling in the value of the
The text-box labeled
from allows the graph to filter out data by specific
tags. This is where the tags from the
stardog.yaml file discussed above
come into play. In this way we can limit the graph to looking at a specific
Stardog is a complex system and performance characteristics vary widely by workload. Interactions between components can bring about unforeseen situations which can be difficult to debug. Monitoring provides the foundational level of data required to reason about those interactions. This helps to increase availability and fix crucial issues at runtime. The metric visualization that Datadog provides via the Stardog plugin makes this task more attainable.
|Core Services||25M Nodes & Edges|
|Complete OWL 2 Reasoning||10K Axioms|
|Integrity Constraint Validation||20 Constraints|
|Enterprise Graph Security & Auth||4 Users/Roles|
|Amazon Web Services|
|Pivotal Cloud Foundry|
|Permissable Use||Non-commercial||Commercial production|