A Vision for Voicebox

Mar 5, 2024, 8 minute read
Stardog Newsletter

Get the latest in your inbox

Introducing Stardog Voicebox

Voicebox is the world’s first Conversational Data Platform. Anyone can ask any question of enterprise data and get an accurate, timely, and trusted answer immediately. Stardog Voicebox is a fully cloud-enabled solution and 100% hallucination-free.

The first—but by no means only—payoff of our work since January, 2023, Voicebox answers a key business question: How do I empower my knowledge workers to get new, deep insights to drive business forward without relying on IT?

How is Voicebox different?

This space is crowded but Voicebox is different in at least four big ways:

  1. Voicebox is for knowledge workers to use today, including but not limited to the most complex problems and questions in regulated orgs like pharma, banks, and manufacturers.
  2. Voicebox is 100% hallucination-free, something the vendors of RAG-based solutions won’t say.
  3. Voicebox is focused on database-resident data. The RAG-based alternatives are for documents.
  4. Voicebox is the only hybrid cloud—for database-resident data in the cloud and on-prem—conversational data platform via Stardog Karaoke, the Stardog Voicebox Appliance.

I haven’t talked publicly about #4, so this is really an announcement inside an announcement.

Stardog Karaoke: Bringing Voicebox to the Edge in Hybrid Cloud

While Voicebox is available in Stardog Cloud today, it’s available on the edge, too. Karaoke is a fully-managed, off-cloud hardware-software appliance for regulated businesses that pursue a hybrid cloud strategy. It’s a “stack in a rack” system powered by Supermicro and NVIDIA and running the entire Stardog Voicebox platform and app.

I’m proud of the work we’ve done in partnership with NVIDIA and Supermicro to make Karaoke a reality.

Why did we build Voicebox?

The simplest reason is that our customers need an easier way to consume the knowledge graphs they’ve built with Stardog. I knew two minutes after my first session with ChatGPT that we were going to build Voicebox. But that says both too much and too little at the same time.

Empathy and Customer Obsession

I believe in the power of radical empathy as a means of product design and strategy. I believe in the power of absolute customer obsession. We built Voicebox, in fact, because the people who need it the most—pharma plant managers, supply chain analysts, banking compliance officers—can’t build it for themselves.

I have been obsessed for years with knowledge workers that I call “non-programming experts”; that there’s no good term for them is part of the problem, of course. In the enterprise software game we often skip right past this cohort by sorting people into “technical” and “not technical”, when all we really mean is “knows how to program” and “doesn’t know how to program”. That’s pretty dumb.

We built Voicebox so our customers are empowered to answer for themselves.

Aha! Moments Run the World. Disconnected Data Inhibits Them

We built Voicebox because “aha!” moments come when you answer those really tough questions—the ones that grab you deep and won’t let go till you answer them—that, hard or not, are often shallow. Which doesn’t mean easy; it means the answer to the question is easy once you see the right picture of the right part of the world.

Aha! moments power the world, and Stardog was built to create aha! moments by connecting disconnected data. To get value from Stardog before Voicebox, you needed to learn our query language or wait until IT built a web app or…something else regrettably difficult or time-consuming.

We built Voicebox so our customers can go fast.

Knowledge Graphs Power Aha! Moments but are Hard to Consume

Knowledge graphs are powered by data; but they don’t show you pictures of data; a knowledge graph shows you pictures of the world. In fact, a real-time, native knowledge graph powered by Stardog shows you contextualized pictures of the world powered by connected data that spans data silos, formats, systems, and sources. Aha! moments arrive when that one missing piece of the big mosaic puzzle of reality locks into place and, despite the fact the puzzle is incomplete, now you can see the big picture and know what to do next!

Damn that’s the good stuff, friend. I live for those moments. I live professionally to help my customers have those moments more often. Stardog was born in the one of those moments as I was driving my Triumph Speed Triple from University of Maryland to DC one cold Friday night in December. The aha! moment hit me so hard that I had to pull over and catch my breath.

We built Voicebox so our customers can see the right part of the world in the context of business meaning.

The Skills Gap is Bullshit

I had my own private aha! moment leading to Stardog while working at an AI lab with my co-founders, Evren Sirin and Mike Grove. People wanted to know why three AI guys were starting a data company.

See, Big Tech likes to talk a big bullshit game about the so-called skills gap. Because a supply chain or biotech expert with two master’s degrees, who speaks two languages, and works 50+ hour weeks is really missing Python skills. As if the ability to write SQL is the crowning achievement of professional life.

Society doesn’t have a skills gap problem. Hey, I’m a second-career guy and taught myself to program in my late twenties. All respect to people who do it that way. I’m literally one of those people. Demanding everyone become a programmer or data engineer to access data meaningfully isn’t scalable and it’s not necessary and it’s not empathetic.

We built Voicebox so our customers can operationalize their real-world expertise to delight, amaze, streamline, and propel their orgs and themselves.

Making a World Where Domain Expertise is Enough

Okay, last reason, I promise, but I want to live in a world where domain expertise is enough because reality is hard and the challenges we place as a civilization are hard. No one aint got time for doing stuff a computer can do for them!

Less lofty but no less real: knowledge workers in regulated industries are in a tough spot since they, no less than anyone else, need to win days by having aha! moments. But, unlike most of the rest of us, they work in environments with lots of externally-imposed obstacles to aha! I don’t live that life but I have empathy for them. And, as a consumer and entrepreneur and a person with a body, I need knowledge workers at manufacturers and banks and pharma companies to get it right more often than not. My contribution to making it so is to create an experience where they can access data—no matter the silo, format, storage location, etc—in meaningfully contextualized ways with their existing toolkit, that is, their domain expertise and their native language.

I want to live in a world where decisions that affect everyone are based on data and human creativity. I want to live in a world where domain expertise and native language are all the skills needed to achieve aha! on the regular.

We built Voicebox so our customers can get the right answer today and still be home in time to take their kids to soccer practice.

10 Things I’ve Learned about GenAI So Far

My marketing team is gonna kill me for writing so much so let me be very concise here and go speed dating on this part:

  1. An LLM is a machine for always knowing what to say next  and that is a valuable thing to have.
  2. LLM hallucinations are a pernicious problem because they’re subtle and persuasive and, hence, nearly undetectable.
  3. Hallucinations are like a faithless friend who smiles and flatters to your face but talks shit about you behind your back. You want it to be one way, but it’s the other way.
  4. Semantic Parsing is better than RAG alone for question answering in regulated industries.
  5. A Knowledge Graph goes with Semantic Parsing and databases—that is, structured and semi-structured data—like a vector database goes with RAG and with documents.
  6. There will be hybrids of these approaches but that’s a lot harder than people think.
  7. Data management based on business meaning rather than storage location is especially important in the GenAI era.
  8. The data that isn’t in the cloud (yet or ever) really matters, too. Hybrid Cloud—some data “in the cloud” and other data “on premise”—is the future and will be for at least 10 years.
  9. Hardware matters again; it always did but we offloaded to the hyperscalers for a minute. GenAI means techies have to know and care again and that’s just cool.
  10. The way to create non-linear value in the enterprise is multimodal LLMs of the right sort; the enterprise game is about documents, tables, and graphs, not about documents, pictures, and movies.

What’s the Near-term Voicebox Roadmap?

Here’s what’s coming next following the 1.0 release.

  1. Making Voicebox conversations more fluid, natural, and contextually-rich by increasing Working Memory size and persistence
  2. Adding Voicebox to more UX channels and business contexts
  3. More shared Voicebox conversations, sessions, and context across enterprise teams; great conversations are team conversations.
  4. Extending Voicebox to the implementation work of data integration including data mapping and data modeling.
  5. Making Voicebox more powerful and just as easy by adding agentic, asynchronous deep analytics that require non-trivial processing like entity resolution and graph processing.
  6. Increasing Voicebox’s reach by adding real-time document integration
  7. Making Voicebox conversations faster. I challenged Evren and Mike the other day by saying “10x faster in 90 days.” The next day, Evren said they’d made it 2.5x faster that day.
  8. Adding customer-specific model training and fine-tuning to Stardog Karaoke.

Stardog Voicebox is Here to Stay

As long as knowledge workers, business analysts, C-suite execs, and other “lines of business” roles need to talk with data, we’ll still be customer obsessed and empathetic. And we’ll still be AI people bending the models to human will and intent on their behalf.


download our free e-guide

Knowledge Graphs 101

How to Overcome a Major Enterprise Liability and Unleash Massive Potential

Download for free