We caught up with Evren Sirin, Stardog’s CTO, to discuss the tricky task of benchmarking Enterprise Knowledge Graphs—what potential clients ask for, what factors affect benchmarking, how Stardog has improved over the past few years, and more.
Q: Every enterprise is different, and every knowledge graph is different. What factors do you consider when it comes to benchmarking?
As anyone who has worked on benchmarking would admit, benchmarks can be misleading and do not always reflect what a user would see with their own data and usage patterns because, as you point out, there are many differences between each enterprise and their usage of knowledge graphs.
We create benchmarks that represent the use cases we see in practice rather than focusing on synthetic, contrived benchmarks. We are not aiming for a single benchmark that magically tells us performance is good or bad. Instead, we have a more holistic approach and analyze performance over a range of different benchmarks.
Q: What do potential clients ask about and for? What do you wish they asked for?
Sometimes we get questions about results on certain publicly available benchmarks, such as the Berlin SPARQL Benchmark (BSBM) or the Lehigh University Benchmark (LUBM). These benchmarks make it easy for someone to compare products from different vendors, but they do not typically tell how the performance would be for the client’s data and queries.
It is important for the users to analyze and understand their usage pattern and ideally create a benchmark based on that rather than just relying on predefined benchmarks.
Q: What are some recent benchmarks we’ve done?
Our most recent benchmarking effort has focused on Wikidata. Wikidata is the central storage for the structured data of Wikipedia and other Wikimedia projects. It is editable by anyone and provides daily RDF data dumps. Wikidata is continuously growing and has 16.7 billion triples right now. Our benchmarking shows Wikidata can be loaded into Stardog under 10 hours on a commodity server, which translates to 500K triples per second loading throughput.
We have also benchmarked query answering times for Wikidata Graph Pattern Benchmark (WGPB). This benchmark comes with 50 instances of 17 different query patterns resulting in 850 queries. Stardog’s average query execution time for WGPB queries against 16.7 billion triples was 100 milliseconds. Stardog was able to answer 844 queries (99% of all queries) under 1 second so only 6 queries took more than one second. All the details of benchmark results can be found in our Stardog Labs blog post.
Q: Where has Stardog improved the most over the last few years?
There have been significant performance and scalability improvements over the last few years. We have shown that Stardog can provide sub-second query answering times over a hybrid multi-cloud Knowledge Graph with 1 trillion edges. We have improved data ingestion speeds by 500% and can bulk load data into Stardog at a speed of a million triples per second when using faster disks and CPUs. There have been improvements to query answering performance across various use cases, including but not limited to path queries, full text queries, and reasoning queries.
Q: How long do you expect benchmarks to remain accurate?
We are constantly working on performance improvements in Stardog, so every release might be bringing in performance improvements relevant for some benchmark. One thing we pay a lot of attention to is not to have performance regressions. We have built an automated benchmarking tool called Starbench to run hundreds of benchmarks every night to make sure no code committed during the day will result in performance degradation. We might not guarantee we will make every benchmark faster in every release, but we make sure none of them will get any slower.
Q: What are we going to benchmark next?
Our current benchmarking efforts focus on Stardog’s BI/SQL server performance. We have made some significant improvements for SQL query answering performance in Stardog 7.9, but just like SPARQL, there are many different usage patterns for SQL, so we are expanding our benchmarking coverage in this area. We will be publishing these results soon.
Q: Do benchmarks results reflect ROI?
The scalability and the query answering performance of an Enterprise Knowledge Graph directly impact ROI. The scalability allows more data across the enterprise to be unified, and the query answering performance means questions against the Knowledge Graph can be answered in a timely manner. So benchmarks in that regard are indicators for the ROI as long as they reflect the real-world use cases, as I mentioned before.