The Stardog Voicebox Vision
Get the latest in your inbox
Get the latest in your inbox
What’s the big vision for Stardog Voicebox?
Anyone can ask any question of any data and get an accurate, timely, hallucination-free answer immediately.
Everyone who sees Voicebox has the same response—“How can we get this with our data?” There’s a growing awareness that exploratory analytics and data democratization go hand-in-glove and are key to not only digital transformation but also to turning large enterprises from data laggards into data leaders.
In this post I describe the Voicebox vision in full. We will follow-up soon with two companion pieces explaining how Voicebox creates value in two of our key verticals: financial services and defense/intelligence.
Let’s think about the Voicebox vision schematically, step by step. It’s like a puzzle made of these seven parts, each of which answers a simple question—
A multi-hop question to find exposure to California munis in Stardog Voicebox Wealth Adviser.
Obviously there’s a lot more to say in unpacking this schematic, so let’s get into it.
The biggest solvable obstacle to data-driven decision making in the enterprise is that knowledge workers can’t get access to data that’s relevant to them. This is exactly why the universal response to Voicebox is to ask for Voicebox with their data. This isn’t primarily a security or governance problem. Data access is mostly a problem of data silos and data integration. Siloed data is the enterprise default condition in the enterprise, leading to data that is inaccessible to knowledge workers by default.
Year over year in McKinsey’s work on digital transformation, what distinguishes data leaders from data laggards is the percentage of knowledge workers who are enabled to self-serve analytically with respect to data.
🚨 Of course Voicebox isn’t the only game in town. But typically the other ways of interrogating data to achieve aha! moments require help or investment from IT or data science; or they require someone to write some really advanced queries, say; or they require you to already know exactly what you’re looking for. None of these restrictions is consistent with self-service enablement of the sort that McKinsey calls data leadership.
Source: Rewired and running ahead: Digital and AI leaders are leaving the rest behind
Data leaders have knowledge workers that can ask questions about the biz and then answer those questions without relying on IT, data science, etc. Data laggards also have knowledge workers who ask questions, but those questions only get answered weeks or months later by IT, data science, etc., if they ever get answered at all.
🗣 The Stardog Voicebox vision is to enable anyone to use data to make decisions, create insights, and drive business forward. And by “anyone” we really mean “everyone”, i.e., any worker who needs data to win at work. And that’s everyone in the AI era.
McKinsey is right. The challenge is to make data easy to consume, and we do that in Stardog Voicebox primarily by making natural language the universal interface to enterprise data. But we also do it by pushing Voicebox capabilities into the digital places where real work happens like Slack and Microsoft Teams.
Stardog Voicebox answering questions in Slack
Anyone can ask any question? Yeah, that’s the vision in full. At least, any question covered by these fundamental types of analytics questions: factual, including needle-in-haystack and data paths; multi-hop reasoning; metadata control plane (”what data sources contain information about Euros-denominated trades for ESG futures on Borsa Italiana?”); tabular analytics (”what is the relationship between revenue and expenses in the last three FY?”); geospatial (“how many Chinese-registered trawlers are operating right now within 10 nautical miles of Roatan?”); background knowledge; predictive; anomaly detection; and root cause analysis.
Most of these are clear but the last three need a bit more explanation—
Sometimes the answer you need is a report!
🗣 What is OpenAI really good for? Well for Voicebox it’s really best for answering a particular sort of user question—general or background knowledge questions—in context of interactions with Voicebox that answer very specific, enterprise-relevant questions that OpenAI will never understand. For example, I want to ask Voicebox how many customers we have in the capital of Nepal, but I can’t remember the name “Kathmandu”, so I ask OpenAI’s ChatGPT “what’s the capital of Nepal?” And then ask Voicebox “how many customers do we have in Kathmandu?”
Voicebox does tabular reasoning, tabular analytics, and even creates basic visualizations for data that it’s connected across disconnected enterprise data silos.
Most orgs shouldn’t and won’t train a foundational model. This is generally well understood. But most orgs really shouldn’t fine tune a foundational model either, at least they don’t have to. Fine-tuning is a kind of model adapter pattern. But there are other ways to provide that capability. Voicebox’s ability to reach into any sort of data and answer a wide-range of questions means that it will act as dynamic fine-tuning layer for enterprise data.
Hint: this is the real enterprise semantic layer and you won’t get this anywhere else since no relational data-based semantic layer can deliver it.
Other hint: all the “semantic layer” offerings are based at best on the relational data model—including AtScale, dbt, Cube, etc. Some of these are merely a bag of loose metrics.
Good products? Yes. But are they gonna remove the need to finetune data for GenAI? No chance.
Voicebox operationalizes this semantic layer right out of the box—a unifying fabric that captures broad domain knowledge about your operations and business processes like customers, suppliers, materials, and complex relationships and hierarchies between them. Previously that required expensive application development or integration with some other tool or product.
No, really. Other data assistants and chatbots say this, but they really mean “all your documents”. We mean all the data including every database record, document, and even metadata, too. That’s the power of backing an AI data assistant with an enterprise knowledge graph.
Nearly every GenAI app is focused on smart document handling only. All of these RAG apps are search engines in the sense that they do (better) what search engines do now.
Stardog Voicebox isn’t a search engine. It isn’t documents-only and we aren’t looking for doc chunks to cite to answer user questions like “how much vacation leave do I get after my maternity leave if I have 10 years of service?”
🗣 Stardog Voicebox is a question-answering machine. We extract facts from enterprise documents and connect them to facts within enterprise databases to create a complete picture of the enterprise data landscape. That’s the scope of enterprise data that Voicebox answers come from. All the data matters! We call this SafetyRAG since it’s also key to how we eliminate LLM hallucinations.
The vision in-full requires full-spectrum connected data and that’s entirely the point of a native enterprise knowledge graph like Stardog. In this view of the world, the AI document assistants like Glean, Writer, and Hebbia—all great tools that we admire, frankly—are perfectly complementary with Stardog Voicebox but they aren’t substitutes for Voicebox since none of them understands or connects to enterprise databases.
Our vision treats documents like unstructured knowledge-containers and treats database records like structured knowledge-containers and then connects and elevates all that knowledge for Voicebox to answer questions with. We are less interested in the perfect document summary or which paragraph of the employee manual indicates the dental copay. Those are important but they aren’t connected to or relevant for knowledge workers achieving aha! moments.
It won’t do anyone much good for Voicebox to just answer questions. What creates value, what leads to aha! moments, is when Voicebox provides accurate, timely, and hallucination-free answers to user’s most important questions.
The value of accuracy is obvious; but timeliness is no less important. One reason we’ve built Voicebox is that our customers need to move faster than IT can support. Velocity kills the competition! We don’t move or copy data so that answers are always fresh, current, that is, timely.
Hallucination-free is a new and differentiated requirement for a real AI data assistant like Voicebox. It’s another reason Voicebox can’t be replaced by a “chat with your documents” RAG app that hallucinates answers.
See Safety RAG: Improving AI Safety by Extending AI’s Data Reach for more on our hallucination-free alternative to all these LLM-and-RAG apps proliferating these days.
Often it’s not enough to get the right answer but, instead, you need to be able to verify that the answer is right. Maybe the regulatory authority needs to know the basis of a material decision which may be correct but you have to “show your work” to assure them about the integrity of the business process that led to the decision.
Voicebox is a fast, accurate AI data assistant but it’s also got built-in traceability, lineage, and explainability capabilities, too, since sometimes it’s not enough just to be right, but you have to demonstrate that you’re right, too. Why? Because the regulator won’t just take your word for it: you have to show proof of compliance or actually engage with questions in a model review.
The virtuous cycle in regulated industries with Voicebox looks like this:
Ask a question → Get the answer → Browse the answer’s data lineage → Ask another question → Get another answer → virtuous cycle continues until aha! is achieved
That’s why every Voicebox answer comes with embedded data lineage in Stardog Explorer.
Voicebox can provide automatic lineage and traceability for any answer because it answers using enterprise data sources, not black box LLMs. And because Voicebox connects to enterprise data catalogs and governance platforms, it understands the enterprise data landscape and that feeds into its lineage and traceability capabilities.
Immediate answers are better than “some time later”-answers. But how is Voicebox answering questions immediately? We do that in two related but distinct ways.
First, Stardog Voicebox is immediate compared to traditional MDM or other data-movement integration approaches. Stardog Voicebox implementation time is fast. Our customers get to value faster because Stardog Voicebox is an all-inclusive cloud offering that takes only a few weeks to implement.
🗣 For our heavily regulated customers who aren’t in the cloud, Stardog Karaoke is our platform offering inclusive of all hardware and software—CPUs, GPUs, platform, and Voicebox agents, APIs, etc.—in an on-premise appliance. Check it out!
Second, at question time, when a knowledge worker is poised for an aha! moment, Voicebox talks to a family of LLMs and SLMs via Voicebox agents to (1) determine human intent and (2) convert natural language to one or more queries. Which are then executed in Stardog Core against trusted enterprise data sources using our unique data federation capabilities that eliminate costly, slow data movement.
So Voicebox multi-turn conversation performance is a function of two subsystems, both of which are fully engineered by us: GenAI and Knowledge Graph interacting together to give answers to questions immediately.
Data leader orgs empower knowledge workers to self-serve analytically and that means not sending every new question to data science team to answer six weeks later.
No. While question answering to achieve aha! moments is the primary value of Stardog Voicebox, it’s not the only value that Voicebox provides. We extend Voicebox’s LLM-powered natural language interface to data modeling, mapping, business rules, and data quality, too.
All of the arguments for Voicebox for end-users—immediacy, speed to insight, self-service, democratization—apply to implementing Stardog, too. That includes using natural language as the interface to every Stardog job-to-be-done including data modeling and data mapping.
Business rules and data quality constraints are both implementation and usage jobs-to-be-done; sometimes the right way to get a multi-hop reasoning question answered is to tell Voicebox about some business rule that is specific to your use case or enterprise. Voicebox includes the ability to add business rules and data quality constraints to Stardog on the fly.
Notice what I haven’t talked about in this blog post? ETL, query syntax, skills acquisition, special training, GPUs, LLM quantization, parameter size, etc. We live, love, eat and breathe all that tech stuff so our users and customers can focus only on what matters to them: having aha! moments and moving their business forward at breakneck speed.
Stardog Voicebox is hallucination-free and shovel ready GenAI that can transform data leaders into data overachievers and data laggards into data leaders. LET’S GO!
Nearly everything in the Stardog Voicebox vision is already shipping in production in Stardog Cloud, in Stardog Karaoke, and to on-prem deployments with financial services, life sciences, manufacturing, and defense customers globally. But some parts of the vision are on the near-term roadmap.
How to Overcome a Major Enterprise Liability and Unleash Massive Potential
Download for free