Introducing Plan Endpoint

Stanislav Klenin Nov 12, 2020

When it comes to languages for querying databases, they tend to look more human-readable than a typical programming language. SPARQL, as well as SQL, employs declarative approach, allowing to describe what data needs to be retrieved without burdening the user with minutiae of how to do it.

Besides being easier on the eyes, this leaves a DBMS free to choose the way it executes queries. And as is typical for database management systems, Stardog has its own internal representation for SPARQL queries: the query plan.

During the initial phase of evaluation Stardog converts a query into this hierarchical plan structure and then, unless it is trivially simple, proceeds to optimize it, applying various transformations, based upon knowledge it has about the data. Even though it tries its best, it can’t make optimal decisions all of the time so occasionally it might be necessary to rewrite a query a bit so that the optimizer decides to transform the plan differently. You may want to compare different join algorithms or insist on some specific one - query hints can help but they don’t offer full control over the execution.

But what if you could modify the plan directly, without going through SPARQL representation at all?

In version 7.4.4 this is now possible!

Let’s experiment with this simple join query from Introduction to SPARQL:

SELECT ?singer ?song
WHERE {
    ?singer :sings ?song .
    ?singer :memberOf ?band .
}

This query finds singers who are members of music bands, and songs they sing - in other words, it performs a join operation. The explain command (or Show Plan button in Studio) can show us what kind of join it is:

$ stardog query explain --verbose GettingStarted_Music query.sparql
Explaining Query:

SELECT ?singer ?song
WHERE {
    ?singer :sings ?song .
    ?singer :memberOf ?band .
}

The Query Plan:

QueryPlan
Projection(?singer, ?song) [#93]
`─ HashJoin(?singer) [#93]
   +─ Scan[POSC](?singer, :memberOf, ?band) [#208]
   `─ Scan[POSC](?singer, :sings, ?song) [#3.8K]

Note: we are using --``verbose flag here to print a plan in a format suitable for plan execution. It prefixes the plan wth QueryPlan header and changes a few other things - which are not important for a simple example like this. The plan can now be executed with query command:

$ cat query.plan
QueryPlan
Projection(?singer, ?song) [#93]
`─ HashJoin(?singer) [#93]
   +─ Scan[POSC](?singer, :memberOf, ?band) [#208]
   `─ Scan[POSC](?singer, :sings, ?song) [#3.8K]

$ stardog query GettingStarted_Music query.plan

Perhaps you don’t like the HashJoin. It is a pipeline breaker after all. We can try MergeJoin instead - what can possibly go wrong?

$ cat query.plan
QueryPlan
Projection(?singer, ?song) [#93]
`─ MergeJoin(?singer) [#93]
   +─ Scan[POSC](?singer, :memberOf, ?band) [#208]
   `─ Scan[POSC](?singer, :sings, ?song) [#3.8K]

$ stardog query GettingStarted_Music query.plan
Error parsing at line 4: Invalid merge join on var ?singer

Of course, merge join requires its arguments to be sorted by the join key. In our plan the join key is ?singer, which is in subject position in both triple patterns. However, as indicated by POSC sort order, our arguments are sorted by predicate first (because it is constant in both patterns) and object next. These are the decisions query optimizer has to make each time it runs a query. But when running a plan directly, as opposed to a query, the optimizer does not interfere with the the plan provided by the user. For now it does not even have a chance to run the same amount of sanity checks to guarantee correctness of the plan - something we can definitely improve in subsequent releases.

Let’s try changing index orders. P still need to be first and then we can sort by subject:

$ cat query.plan
QueryPlan
Projection(?singer, ?song) [#93]
`─ MergeJoin(?singer) [#93]
   +─ Scan[PSOC](?singer, :memberOf, ?band) [#208]
   `─ Scan[PSOC](?singer, :sings, ?song) [#3.8K]

$ stardog query GettingStarted_Music query.plan

This time, the command should return the same results as the original query.

It is worth reiterating that SPARQL is declarative language for good reasons. Writing query plans directly is error-prone and relies too much upon internal details, as was evident from the experiment above. We don’t envision it as a primary means of communication with Stardog.

It is, however, a good supplementary tool for troubleshooting problematic queries. Which will become even more convenient once the Studio adds support for it. And its use cases are not limited to troubleshooting. There are plans (pun intended) to utilize this capability to implement plan pinning - attaching existing plan to selected queries so that query engine won’t try to optimize them if underlying data changes.

But for now feel free to experiment with query plans and let us know if anything does not behave as it should.

Introducing Plan Endpoint

Keep Reading:

Stardog Data Flow Automation with NiFi

Stored Query Service

Try Stardog Free