Stardog attended the Enterprise Data World conference organized by Dataversity last week. Here are some of the highlights.
Data Variety is the 800 Pound Gorilla
Michael Stonebraker got the conference started with his keynote talk that covered many aspects of data management. He had bold statements on topics ranging from distributed computing (Hadoop is dead except for HDFS) to machine learning (deep learning is not going to take over the world). The main premise of the talk was that out of the infamous “three vees” of Big Data–Volume, Velocity, and Variety–the first two challenges have been mostly addressed but the 800 pound gorilla in the room is the Data Variety problem. It was nice to see this point highlighted since we’ve been making this exact same point for years now.
Further, establishing a “common vocabulary” within an enterprise was highlighted as one of the ways to tackle the data variety problem. Failing to do so meant you might end up in a courtroom (!) to defend yourself:
You are liable for not establishing a “common vocabulary” in your organization! @Danette_McG bringing it at this morning's courtroom drama #EDW19 #DataGovernance #DataManagement #DataQuality pic.twitter.com/iJ6bylWnc7— EnterpriseDataWorld (@EnterpriseData) March 20, 2019
Data is the Most Important Asset for an Enterprise
You’ve probably heard a variation of this phrase as it has been over-used to the point of cliché. But as many other clichés there is some truth to this statement. Social networking companies have been aware of this from their inception but it is clear that now enterprises all over the spectrum are internalizing this mantra. One estimate that was shared at the conference was that today 20% of the revenue in companies is wasted due to bad data. Companies from very different sectors like finance and biotechnology talked about strategic initiatives to better utilize their data. Connecting data sources within an organization is an important part of these initiatives as unconnected data is one of the most significant obstacles to understanding and analyzing enterprise data.
Knowledge Graphs on the Rise
Knowledge Graphs were quite popular at the EDW conference. Stardog along with other Knowledge Graph vendors got a special shout-out from the organizer Tony Shaw at his opening address as the technology that everyone at the conference should learn about. Given the recurring themes about data variety and data as a strategic asset, there were many people eager to learn more about knowledge graphs. The Stardog team–consisting of me, Virginia, Martin, Karen and Grant–was busy answering questions about knowledge graphs and listening to different challenges people face at their organizations.
Feed the Machine Learning Monster with Good Data
The increasing importance of machine learning for analytics came up many times including in Stonebraker’s keynote. The common sentiment was that a successful machine learning project needs good quality data and preparing data is the hardest and most time-consuming part of the project. The caution against deep learning stems from this fact because deep learning requires even more data than traditional machine learning techniques and that is not feasible in many cases. It was suggested that, if a person knows the business and data, then teaching her machine learning skills is easier than teaching a data scientist about the business and data. Our partners from Pool Party had a presentation on the topic of machine learning that also emphasized the importance of semantics and explainability for machine learning.
Don’t Copy, Virtualize!
Data virtualization was another hot topic at the conference. There were many horror stories shared about data warehouses (slow, rigid, expensive, etc.) and how creating copies of data (and copies of copies) becomes a data management nightmare. As a result, data virtualization becomes an attractive alternative where data can be kept in place in the data silos and retrieved when needed. After all, data silos arise because organizational units need to control their data to perform their daily operations and data silos will not completely go away. Connecting data silos using virtualization gives us the best of both worlds: organizational units can control their data while also achieving a unified view across the data silos. Rich data models and metadata becomes crucial for a successful virtualization solution and the common vocabulary mentioned above plays an important role.
Many non-technical challenges of data management and governance were discussed at the conference. The vision of a Knowledge Graph connecting data silos of many kinds using virtualization and materialization as needed might sound like a far-fetched idea. Many people described the initiatives in their organizations to modernize their data landscape as a “journey”. But it is possible to take small steps in that journey that will yield immediate results as our customers do every day. Stardog is here to help you along on that journey. Let us know if you want to learn more!