Five Tips: Data Modeling for Data Fabric Success

Natalie Clark

Apr 6, 2021, 6 minute read

A common question regarding deploying a data fabric is how to develop an enterprise-wide data model. Many think this is a prerequisite to the initiative, and the undertaking may strike you as potentially expensive and time-consuming. They wonder where they’ll get the resources and if they’ll need to hire dedicated ontologists.

We know in fact that nothing could be further from the truth. First, you only need to define as many concepts as needed for your first use case. Second, as your business needs change, your models can adapt to meet new requirements thanks to the flexibility of the graph. Here are five tips to set you up for success:

1) Adopt an MVP mindset for modeling

Start by identifying a critical business problem to spearhead the broader data fabric initiative. Approach your data fabric with an MVP mindset and do strictly the minimal work to accomplish the first significant tranche of business value.

For example, at one global pharma company, in just 6 months with one back-end developer and two front-end engineers, a total of 13 enterprise data sources were modeled and unified. The speed and efficiency here are especially crucial since the mapping and modeling of these 13 data sources are reusable across many different internal applications. Modeling “once for all”, instead of repeating data modeling for each new application, is key to data fabric’s ROI. The resulting new app at this global pharma is used by over 1,000 internal users! So it’s entirely possible to drive value with your first use case without boiling the ocean.

2) Pick your first use case strategically

Struggling to determine the best business use case to start with? A great place to start is with customer data, since improving this data will lead directly to new revenue opportunities. It’s also historically fragmented data, so there’s likely big upside in connecting this varied data. Successful revenue generation will build momentum and awareness for the utility of data fabric as a differentiated, strategic approach to building connected data enterprises.

Further, it makes sense to look for these symptoms of disconnected data:

cross-functional efforts that cut across lines of business, eg supply chain
business units who are stressing IT systems with iterative question and answer cycles
use cases that frequently deal with unanticipated questions or have frequently shifting requirements
projects like digital twins which require management of hundreds of millions of relationships and rules

Alternatively, look within your own IT org to begin delivering value. At one leading financial services firm, they began with unifying IT asset management systems to improve their real-time incident management. Over time, the data fabric grew to connect 4,000 assets across 400 systems related to 60,000 employees. Now, this initial work is leveraged as part of a broader operational risk use case that was critical to the firm to proactively analyze the impact of potential risks in a systematic manner.

3) Using a data catalog to accelerate your data modeling

Data catalogs serve a key role in data fabrics as they provide an inventory of an organization’s information assets. Leveraging data catalogs jumpstarts the modeling process, so you can avoid starting from scratch. Stardog lets you pre-populate and pre-build all the Virtual Graphs using the information in your data catalog. For example, you can point the catalog at a database, take a table from that database and tag it as “customer” with the columns “first name” and “last name,” and then have Stardog read that information in your data catalog and create a mapping.

Data governance frameworks also give you a leg up because you’ll have a better fundamental understand of where data is in your organization. This means you need to spend less time on modeling exercises where you define objects and their relationships; these definitions are the primary goal of data governance and can be used in your data model MVP!

4) The benefits of reusable data models

The data reusability that Stardog supports will save countless hours and dollars over time. How exactly does Stardog make data reusable? Stardog supports modeling the enterprise with an expressive, extensible data model based on semantic graphs. Capture your business and domain rules in the data model vs the data itself, and Stardog’s Inference Engine intelligently applies these rules at query time. Using a unique data virtualization capability, Stardog performs this on existing enterprise data where it lives, avoiding expensive or pointless data copying and data movement. This further saves money on storing redundant data.

What are the benefits on data reusability in practice? Ultimately it saves time when editing the use case, and that benefit is compounded on successive use cases. This is because when business requirements or definitions change, you can simply write a new modular rule to amend the model. Learn more about how Stardog uniquely supports data reusability in this blog.

Compare this to the legacy process to support a new use case. The current laborious process for integrating enterprise data typically involves extraction, translation, modeling, and mapping between various applications. The custom code required for modeling and mapping each new application quickly becomes unwieldy at large scale, slowing the pace of innovation and insight. In contrast to this application-centric practice, Stardog creates a reusable network of data to power your business. Data is linked based on its meaning, and multiple definitions are easily supported to power various applications. Not only does this cut down on time and cost of ETL, coding, and the inevitable time spent fixing errors, this data-centric practice actually enriches and accelerates existing investments.

5) Get a head start on data model development

There are many open source data models that Stardog can read, helping customers to accelerate their data model development. An open source data model may account for about 80% of modeling required for your project, with the remaining 20% customized based on your proprietary data or unique internal operations. Our team can help advise on publicly available data models that can suit your use case.

Stardog has also committed to the development of additional public data models through the Cloud Information Model (CIM). CIM is an open specification being developed as part of the Joint Development Foundation under the Linux Foundation. Stardog joined the CIM consortium in February, 2020, alongside partners Amazon Web Services and Salesforce, among others. CIM aims to provide ready-to-use data models for predefined domains that are not tied to any application or vendor.

Whether you are leveraging an existing model or using your own, you can start modeling using the Models Hub in Stardog Studio. The Models Hub allows you to build a schema quickly using an intuitive interface. You can import an existing open source data model and modify it, or create a new model right in Studio.

Ready to get underway on your data fabric? Get started for free today.