Stardog Data Flow Automation with NiFi

Paul Jackson Sep 10, 2020

Having the flexibility to either access your enterprise data in place as a V irtual G raph or to load some portion of that data into Stardog’s graph database (or a combination of the two) is a key differentiator for Stardog. We are happy to announce a new feature that enhances the latter option with the release of Stardog Nifi support in v7.4. With NiFi integration, the process for transforming and mapping all your data sources to the Knowledge Graph are in one ecosystem - an ecosystem that is scalable, reliable, and reduces the number of different technologies that must be managed.

The Problems of Managing Complex Transformations

As our customers’ use of their Knowledge Graphs grow, the set of data sources and the rules associated with them grow as well. These data sources are updated periodically from internal and external sources. The processes to support that can include data validation, standardization transformations, merging with other data sources, and matching to canonical (or master) data elements. All of these processes require special tools for data access and transformation.

The number of data sources and the rules associated with them grow and the processes, tools and technologies to support them grow as a result. These processes and technologies can become difficult to manage and track over time. Processes become brittle, past configurations become lost, and failures in the complex process become difficult to recover from. What’s needed is a framework that provides unified access to all the needed technologies and allows administrators to focus more on the transformation rules and their orchestration and less on the low-level technical challenges.

Managing Data Transformations with NiFi

Apache Nifi is an ETL product “designed to automate the flow of data between software systems.” (footnote) In addition to connectors (Nifi calls them “processors”) for a myriad of systems and a large set of built-in transformation tools, its architecture brings scalability, reliability and recovery. It includes a flowchart-style user interface for modeling and running complex data transfers and transformations as well as APIs for driving these processes from existing management systems.

Stardog now ships with a connector for NiFi, allowing our customers to streamline the data transformations process through a single technology, rather than having to maintain one a separate SQL pipeline or other technology alongside the Stardog toolkit. Now both of these processes can run nicely within the NiFi framework.

The initial release includes three processors and one service. One of the processors is for loading data, either from a CSV, JSON or RDF file or from any supported database. This processor can be used to ingest data that’s pulled from any data source that NiFi supports, not just those data sources that Stardog can connect to directly. Another processor is for querying Stardog. The queries can be configured in the processor directly, or they can be queries that were written in Studio and saved to and managed in Stardog. There is support for query parameters, which can take values from upstream processors, allowing for powerful interaction between the processors. Finally there is a processor for updating a Stardog database. It has the same support for saved queries and query parameters as the query processor. The one included service is used for setting the connection and credentials to the Stardog server in a single place.

Stardog NiFi Processors

StardogPut Processor

What’s Next?

One of the next steps will be to integrate the NiFi cluster with the Stardog cluster so users do not have to manage yet another infrastructure. We’ll also add support over time for other ELT tools like MuleSoft, SnapLogic and others as our customer’s needs direct.

Let us know if you try it out! We’ve already had one user ask for LIMIT and OFFSET as parameters that can be configured in the query processor. We’re interested to know how you would like to use it and what you would like to see added.

Stardog Data Flow Automation with NiFi

The Problems of Managing Complex Transformations

Managing Data Transformations with NiFi

What’s Next?

Keep Reading:

Introducing Plan Endpoint

Stored Query Service

Try Stardog Free