Stardog 7.0 Beta 2

Scott Fines

Jun 11, 2019, 4 minute read

We’re happy to announce the release of Stardog 7.0 Beta 2, complete with lots of changes designed specifically to help make the world’s best Knowledge Graph even more awesome.

First and foremost, this beta has our new storage engine, code named Mastiff. Mastiff has substantially faster write performance, especially when running in a cluster. It was designed specifically for our horizontally scalable cluster as it can handle many simultaneous clients all writing at once without any of them blocking one another. The beta 2 release further improves this performance and improves stability of the overall system.

Cluster Write Performance

The main reason to switch from B+-trees to LSM was to improve write performance. Of course, performance is a bit vague. Let’s be specific. We wanted to make writing data faster, but we also wanted to improve concurrency, allowing more transactions to operate at the same time and thus improving overall throughput.

Mastiff was designed from the start to improve concurrency. That meant, in the end, rebuilding Stardog’s transaction subsystem to use Snapshot Isolation, which allows us to avoid locking entirely. Which means concurrent writes don’t queue up waiting for locks. The storage system itself uses Log Structured Merge Trees to improve write performance wherever possible.

The consequences of these decisions are shown in the graph below:

As the graph shows write throughput is much faster in Stardog 7 than it is in Stardog 6.

Performance and Stability

It doesn’t do much good to have fast, concurrent writes if the system isn’t stable. So we have devoted a lot of time to make Stardog 7 stable and reliable. In addition to system stability we’ve also put a lot of effort into ensuring that performance is stable, that is, consistent and predictable. It can be difficult to work with a system which has highly variable throughput. Performance in Beta 2 is much more consistent, and it’s much more in line with the general performance consistency that a move from B+-Trees to LSM promises in principle:

The blue line is Stardog 7 beta 1, and the orange line is Stardog 7 beta 2. As the graph shows the patterns in Stardog 7 beta 2 are far more predictable and stable with higher average throughput.

Memory Management

While RocksDB is a powerful, low-level key-value storage engine, it can be a resource hog. In our stress tests we found that the Java’s Out-Of-Memory (OOM) handler was being tripped quite a bit due to Linux optimistic provisioning of memory and RocksDB’s greedy memory use. To solve this problem, we added new memory management controls to guarantee our customers the stability that they expect from Stardog.

Because RocksDB is a C++ library, all of its memory use is outside of the JVM’s control and thus we could not rely on our existing memory management to control it. We needed a new approach.

First, we analyzed usage patterns of the lowest levels of memory management in libc. We found that in multi-threaded cases we needed less concurrency than we initially assumed. The memory arena count (glibc.malloc.arena_max) was set much higher than we needed, resulting in fragmentation. Tuning this parameter down gave us more efficient memory usage.

Second, we tightened the memory budget of the overall system to avoid OOM errors and to allow more control for running cluster nodes with hard memory limits. A new control surface was added to allow operators to set a total amount of memory that the system should use. From that budget we allocate out portions to various Stardog subsystems.

Parameter Configuration

Next, to ship the most stable system possible, we carefully examined every possible RocksDB configuration option. Our goal was to create a stable performance profile. We configured RocksDB’s LSM tree to use 5 levels with an aggressive compaction configuration to smooth out the jitter in throughput:

ColumnFamilyOptions::num_levels = 5, 
ColumnFamilyOptions::level0_file_num_compaction_trigger = 2

When we set the number of levels to a low value, like 2, we saw similar average performance, but the behavior was much less predictable. So we measured the standard deviation of the throughput for each value. We chose the smallest number of levels that gave us the most stable (lowest standard deviation) throughput.

Transaction Log Subsystem

When we switched to RocksDB for storing our column families it opened up much higher levels of write concurrency for us. This extra throughput created congestion in our transaction log subsystem, especially during large transactions. To support the extra throughput, we no longer block all writes for the entire time. Instead, we split large transactions into a configurable chunk size and write each chunk. This avoids long periods of contention for access to the log. After this change we observed improvements to writing as high as 50%.

Try it out!

We really hope you try out Stardog 7 beta 2, and let us know what you think! Learn more about Stardog 7 beta 2 at community.stardog.com. You can download the beta release here and a trial license for it here.