Showing results for

Back to all articles

Stardog Data Flow Automation with NiFi

Paul Jackson Sep 10, 2020

Having the flexibility to either access your enterprise data in place as a Virtual Graph or to load some portion of that data into Stardog’s graph database (or a combination of the two) is a key differentiator for Stardog. We are happy to announce a new feature that enhances the latter option with the release of Stardog Nifi support in v7.4. With NiFi integration, the process for transforming and mapping all your data sources to the Knowledge Graph are in one ecosystem - an ecosystem that is scalable, reliable, and reduces the number of different technologies that must be managed.

The Problems of Managing Complex Transformations

As our customers’ use of their Knowledge Graphs grow, the set of data sources and the rules associated with them grow as well. These data sources are updated periodically from internal and external sources. The processes to support that can include data validation, standardization transformations, merging with other data sources, and matching to canonical (or master) data elements. All of these processes require special tools for data access and transformation.

The number of data sources and the rules associated with them grow and the processes, tools and technologies to support them grow as a result. These processes and technologies can become difficult to manage and track over time. Processes become brittle, past configurations become lost, and failures in the complex process become difficult to recover from. What’s needed is a framework that provides unified access to all the needed technologies and allows administrators to focus more on the transformation rules and their orchestration and less on the low-level technical challenges.

Managing Data Transformations with NiFi

Apache Nifi is an ETL product “designed to automate the flow of data between software systems.” (footnote) In addition to connectors (Nifi calls them “processors”) for a myriad of systems and a large set of built-in transformation tools, its architecture brings scalability, reliability and recovery. It includes a flowchart-style user interface for modeling and running complex data transfers and transformations as well as APIs for driving these processes from existing management systems.

Stardog now ships with a connector for NiFi, allowing our customers to streamline the data transformations process through a single technology, rather than having to maintain one a separate SQL pipeline or other technology alongside the Stardog toolkit. Now both of these processes can run nicely within the NiFi framework.

The initial release includes three processors and one service. One of the processors is for loading data, either from a CSV, JSON or RDF file or from any supported database. This processor can be used to ingest data that’s pulled from any data source that NiFi supports, not just those data sources that Stardog can connect to directly. Another processor is for querying Stardog. The queries can be configured in the processor directly, or they can be queries that were written in Studio and saved to and managed in Stardog. There is support for query parameters, which can take values from upstream processors, allowing for powerful interaction between the processors. Finally there is a processor for updating a Stardog database. It has the same support for saved queries and query parameters as the query processor. The one included service is used for setting the connection and credentials to the Stardog server in a single place.

Stardog NiFi Processors

StardogPut Processor

What’s Next?

One of the next steps will be to integrate the NiFi cluster with the Stardog cluster so users do not have to manage yet another infrastructure. We’ll also add support over time for other ELT tools like MuleSoft, SnapLogic and others as our customer’s needs direct.

Let us know if you try it out! We’ve already had one user ask for LIMIT and OFFSET as parameters that can be configured in the query processor. We’re interested to know how you would like to use it and what you would like to see added.

Keep Reading:

Introducing Plan Endpoint

When it comes to languages for querying databases, they tend to look more human-readable than a typical programming language. SPARQL, as well as SQL, employs declarative approach, allowing to describe what data needs to be retrieved without burdening the user with minutiae of how to do it. Besides being easier on the eyes, this leaves a DBMS free to choose the way it executes queries. And as is typical for database management systems, Stardog has its own internal representation for SPARQL queries: the query plan.

Stored Query Service

Stardog is a very extensible platform and the SERVICE keyword in SPARQL is one of its main extension points. It was originally introduced for SPARQL Federation, i.e. querying remote SPARQL endpoints on the Web, but at Stardog we recognised a long time ago that SERVICE could be used far beyond that. For us and our customers, it is a general mechanism for incorporating all sorts of computation within SPARQL queries, for example, we have been using it for full-text search and Machine Learning.

Try Stardog Free

Stardog is available for free for your academic and research projects! Get started today.

Download now