Blog | Stardog Labs

Performance

Robust Query Planning for Federated Queries

Lars Heling

Jan 23, 2024

In our previous blog post, we discussed some of the challenges when it comes to querying heterogeneous federations. In this blog post, we want to focus specifically on the challenge of estimating the number of (intermediate) results in federated settings and present our new robust query planning feature to address this challenge. In federations with Virtual Graphs, SPARQL endpoints, or other remote sources, it is difficult to estimate the number of results (i.

Performance

Querying Heterogeneous Federations with Stardog 9 Up to 30x Faster

Lars Heling

Apr 25, 2023

In many companies data is spread over a variety of sources in different departments and many business cases require accessing and analyzing this data in an integrated manner. Federated queries in SPARQL allow users to query multiple data sources as a unified Knowledge Graph. In Stardog, federated query processing is not limited to SPARQL endpoints as data sources but provides means to access data from a variety of services including Virtual Graphs (VGs) backed by a variety of data sources including SQL or NoSQL databases or other stardog servers.

Admin

Unified Process Monitoring

Simon Graetzer

Oct 18, 2022

Did you ever have a slow query, export or any other operation and wondered “Is this ever going to finish?!”. Many operations such as SPARQL queries, db backups etc have performance characteristics which may be hard to predict. Canceling rogue queries was always possible, but there was not yet a way to do so for other potentially expensive operations. If this ever affected you then the new unified process monitoring feature we added in Stardog 8.

Chaos Testing

Chaos Testing Stardog Cluster for Fun and Profit

Paul Marshall

Jul 13, 2022

At Stardog we work hard to build software that’s not only performant but also extremely robust. Stardog Cluster is a highly available, key component to ensuring Stardog remains up and running for our customers. However, every distributed system is susceptible to faults and bugs wreaking havoc on it. Most software testing checks the “happy path” through the code to verify it behaves as designed. Moving a step further, tests can check some obvious failure cases; for example, if one service in a deployment is terminated.

Data Science

Knowledge Graphs for Data Science - Part 1

Catherine Dalzell

Mar 17, 2022

A database, equipped with an optimized query language, is a powerful tool in the data science toolkit. We’ve all written our share of SQL queries to create data summaries, perform exploratory analyses, check for null values and other basic tasks. When data weigh in at over 1GB, the best approach is often pairing a good database, for basic data munging, with an analytic platform like R or Python for more nuanced calculations.

Data Science

Knowledge Graphs for Data Science - Part 2

Catherine Dalzell

Mar 17, 2022

Data Mappings In a previous post I showed how to explore some data using a graph paradigm, rather than the usual tabular arrangements used by relational databases, and Python’s Pandas dataframe. Running that analysis required importing CSV data files into Stardog. We need to tell Stardog what sort of graph structure we want for the data. Detailed documentation for this can be found in here and here. I’m going to run through the mappings I used to import the E.

Benchmarking

Performance Improvements in Stardog 7.9.0

Evren Sirin

Mar 9, 2022

We recently released Stardog 7.9.0 with many exciting features and improvements. Performance is an area we pay close attention to and improve with each release and 7.9 is no exception. You’ll see faster performance for SQL queries with left joins and SPARQL queries that contain joins involving optionals, faster loading of large RDF or CSV files, and a new sampling feature that lets you get answers in just a fraction of a second.

Benchmarking

Wikidata in Stardog

Evren Sirin, Pavel Klinov

Feb 8, 2022

Wikidata is a free and open knowledge base that can be read and edited by anyone. It serves as the central storage for the structured data of Wikipedia and other Wikimedia projects like Wiktionary, Wikisource, and more. It is also one of the largest publicly available RDF datasets and exports of the complete dataset are provided daily. As of this writing in January 2022, the Wikdiata RDF export contains 16.7 billion triples for close to 100 million entities and the size of the dataset continuously grows.

Use Case

Using Stardog and Knowledge Graphs for ESG Business Impact Analysis

Andrea Westerinen

Jan 12, 2022

Pressing environmental, social and governance (ESG) challenges require data-intensive insights. The ESG market is large and rapidly growing, with sustainable investments growing to more than a third of all global assets, and sweeping environmental regulation being passed world-wide. This confluence of events presents opportunities, and a demand for reliable, insightful data about companies’ performances, and their impact on regional environmental and climate conditions. Amplifying the problem, ESG metrics are usually just a collection of raw, numbers that require study and interpretation with each use.

Buildings, Systems and Data

Eleanna Panagoulia, Zachary Lancaster

Nov 22, 2021

Designing a building or a city block is a process that involves many different players representing different professions working in concert to produce a design that is ultimately realized. This coalition of professions, widely referred to as AECO (Architecture, Engineering, Construction and Operation) or AEC industry, is a diverse industry with each element of the process representing a different means of approaching, understanding, and addressing the problem of building design.

Graph analytics with Spark

Stanislav Klenin

Nov 17, 2021

Stardog Spark connector, coming out of beta with 7.8.0 release of Stardog, exposes data stored in Stardog as a Spark Dataframe and provides means to run standard graph algorithms from GraphFrames library. The beta version was sufficient for POC examples, but when confronted with real world datasets, its performance turned out not quite up to our standards for a GA release. It quickly became obvious that the connector was not fully utilizing the distributed nature of Spark.

Joins and NULLs in SPARQL

Oct 19, 2021

Joins in SPARQL could be confusing to newcomers. You can hear some people celebrating the fact that they don’t need to write explicit join conditions (like in SQL) but if you actually look in the SPARQL spec, you will see the term “join” used like 67 times (as of Oct 2021). Furthermore, if you look at the join definition you will recognize the familiar relational operator that’s not so different from SQL.

FROM vs FROM NAMED in SPARQL

Jun 28, 2021

FROM vs FROM NAMED, what’s the difference, and when should I use one or the other is a constant source of confusion for SPARQL users. It’s one of the main reasons why a query can surprisingly return zero results and the most experienced of us have been tricked by it at least once. This short post goes into a little bit of a detail of the difference and discusses how both can be used to address different use cases.

EKS Volume Snapshots

John Bresnahan

Mar 18, 2021

As discussed in a previous post Stardog Cloud relies on VolumeSnapshots in Kubernetes (k8s) for backups of user data. In this post we will go into more technical details of how to work with VolumeSnapshots in the Elastic Kubernetes Service (EKS). Kubernetes Components Here we will presents the k8s components that are used when working with VolumeSnapshots. We do not go into exhaustive details here but rather briefly give an overview to ease in understanding the concepts in this post.

Benchmarking

Loading a million triples per second on commodity hardware

Evren Sirin

Mar 3, 2021

At Stardog we are continuously pushing the boundaries of performance and scalability. Last month’s 7.5.0 release brought 500% improvement to transactional write performance. This month’s 7.6.0 release improves writing data at database creation time by almost 100%, yielding a million triples per second loading speed using a commodity server. In this post we’ll talk about the details of loading performance. Let’s do the numbers The fastest way to load large amounts of data into Stardog is to do at database creation time.

Benchmarking

Write Performance Improves up to 500%

Evren Sirin

Feb 16, 2021

Stardog 7.5.0 improves write performance up to 500% in some cases. In this post I describe the details of this improvement and share detailed benchmarking results for update performance. Large Updates A common usage pattern for Stardog involves connecting to external data sources through virtual graphs that are queried on-demand without storing any data in Stardog. However, in some cases you might enable virtual graph caching to pull data into Stardog for indexing and in some other cases it is preferable or even necessary to materialize the data in Stardog completely.

Stardog OAuth 2.0 Configuration

John Bresnahan

Feb 5, 2021

Stardog can be configured to use third party OAuth 2.0 identity providers for authentication. In this post we will explain how this is done and how to configure your Stardog server to do it. Architecture Proving Identity — Authentication When interacting with an Open Identify Connect service or an OAuth 2.0 indentity provider like Google the concept of a JSON Web Token (JWT) is at the center of the system. Glossing over some details that will be discussed later, a user known to Google can contact Google and effectively ask for a JWT to prove their identity to a third party.

Benchmarking

Starbench and Dogfooding

Evren Sirin

Jan 27, 2021

Benchmarking is an essential part of developing any performance-sensitive software and Stardog is no exception. For a system as complex as Stardog, any single change in any part of the codebase might have unforeseen implications. Given that our customers have very different use cases, their data and query characteristics vary significantly making it harder for us to make sure their workloads will not slow down when we add a new feature or fix a bug.

Volume Snapshots In Stardog

John Bresnahan

Jan 15, 2021

When running a datastore like Stardog it is important to take regular backups. However it is also important to consider the side effects that taking a backup can cause. Administrators have to balance the frequency of backups against the disruption to resources that can be caused creating that backup. If the backup process is CPU, memory, or IO intensive, care must be taken to make sure that it does not interrupt a period of heavy user activity.

Introducing Plan Endpoint

Stanislav Klenin

Nov 12, 2020

When it comes to languages for querying databases, they tend to look more human-readable than a typical programming language. SPARQL, as well as SQL, employs declarative approach, allowing to describe what data needs to be retrieved without burdening the user with minutiae of how to do it. Besides being easier on the eyes, this leaves a DBMS free to choose the way it executes queries. And as is typical for database management systems, Stardog has its own internal representation for SPARQL queries: the query plan.

Stardog Data Flow Automation with NiFi

Paul Jackson

Sep 10, 2020

We are happy to announce a new feature that enhances your ability to load data into Stardog’s graph database with the release of Nifi support in v7.4.

Stored Query Service

Pavel Klinov

Aug 14, 2020

Stardog is a very extensible platform and the SERVICE keyword in SPARQL is one of its main extension points. It was originally introduced for SPARQL Federation, i.e. querying remote SPARQL endpoints on the Web, but at Stardog we recognised a long time ago that SERVICE could be used far beyond that. For us and our customers, it is a general mechanism for incorporating all sorts of computation within SPARQL queries, for example, we have been using it for full-text search and Machine Learning.

5 Tips for Recruiting Startup Engineers

Evren Sirin

Aug 12, 2020

At Stardog I’m very lucky to work every day with the best engineers anyone could hope for. Startups are hard but having the right teammates makes all the difference when you are building an innovative product in a competitive market. Putting together a team that will tip the scale to your side is not easy. We have made some mistakes along the way with respect to recruiting. But as in any other aspect of running and growing a business, you need to learn from your mistakes and get better at it.

Sandbox

Analyzing COVID-19 Data with SPARQL

Evren Sirin

Jul 13, 2020

For those of us living in the US, increasing COVID-19 case numbers across the country is unfortunately at the top of our minds. We do not lack access to data or infographics about the pandemic but having access to raw data and writing queries yourself can still give you different insights or at least get you better at writing queries. For this reason, we decided to turn the open-source COVID-19 dataset published by New York Times into RDF and included it in our publicly accessible Stardog Sandbox environment.

Admin

Introducing Stardog Labs

Kendall Clark

Jul 6, 2020

I’m delighted to welcome you to Stardog Labs, a new hub of insight, news, and buzz about knowledge graph technology. The site will advance knowledge graph R&D by featuring technical blogs, showcasing job opportunities focused on knowledge graph development, and curating research papers and open source projects.

CIM

Using CIM in Stardog

Evren Sirin

May 5, 2020

The Cloud Information Model (CIM) addresses the brittle data integrations that are common amongst enterprises. With the Stardog CIM archetype, you can start building your knowledge graph with a rich data model that is being standardized by industry leaders.

Reasoning

Stream Reasoning With Stardog

Guest author Bram Steenwinckel

Mar 18, 2020

Guest author Bram Steenwinckel describes how to perform semantic reasoning over streaming data in your knowledge graph.

The knowledge graph blog

Robust Query Planning for Federated Queries

Querying Heterogeneous Federations with Stardog 9 Up to 30x Faster

Unified Process Monitoring

Chaos Testing Stardog Cluster for Fun and Profit

Knowledge Graphs for Data Science - Part 1

Knowledge Graphs for Data Science - Part 2

Performance Improvements in Stardog 7.9.0

Wikidata in Stardog

Using Stardog and Knowledge Graphs for ESG Business Impact Analysis

Buildings, Systems and Data

Graph analytics with Spark

Joins and NULLs in SPARQL

FROM vs FROM NAMED in SPARQL

EKS Volume Snapshots

Loading a million triples per second on commodity hardware

Write Performance Improves up to 500%

Stardog OAuth 2.0 Configuration

Starbench and Dogfooding

Volume Snapshots In Stardog

Introducing Plan Endpoint

Stardog Data Flow Automation with NiFi

Stored Query Service

5 Tips for Recruiting Startup Engineers

Analyzing COVID-19 Data with SPARQL

Introducing Stardog Labs

Using CIM in Stardog

Stream Reasoning With Stardog

The latest research

Learning Analytics Software for Medical Students regarding Pregnancy Complications

Formal ontologies and data shapes within the Software Engineering development lifecycle (TSE)

Evaluating Generalized Path Queries by Integrating Algebraic Path Problem Solving with Graph Pattern Matching

The knowledge graph blog

Robust Query Planning for Federated Queries

Querying Heterogeneous Federations with Stardog 9 Up to 30x Faster

Unified Process Monitoring

Chaos Testing Stardog Cluster for Fun and Profit

Knowledge Graphs for Data Science - Part 1

Knowledge Graphs for Data Science - Part 2

Performance Improvements in Stardog 7.9.0

Wikidata in Stardog

Using Stardog and Knowledge Graphs for ESG Business Impact Analysis

Buildings, Systems and Data

Graph analytics with Spark

Joins and NULLs in SPARQL

FROM vs FROM NAMED in SPARQL

EKS Volume Snapshots

Loading a million triples per second on commodity hardware

Write Performance Improves up to 500%

Stardog OAuth 2.0 Configuration

Starbench and Dogfooding

Volume Snapshots In Stardog

Introducing Plan Endpoint

Stardog Data Flow Automation with NiFi

Stored Query Service

5 Tips for Recruiting Startup Engineers

Analyzing COVID-19 Data with SPARQL

Introducing Stardog Labs

Using CIM in Stardog

Stream Reasoning With Stardog

Share your knowledge graph research

The latest research

Learning Analytics Software for Medical Students regarding Pregnancy Complications

Formal ontologies and data shapes within the Software Engineering development lifecycle (TSE)

Evaluating Generalized Path Queries by Integrating Algebraic Path Problem Solving with Graph Pattern Matching

Let’s stay in touch