Stardog Voicebox FAQ: How LLM, Generative AI, and Knowledge Graphs are the Future of Data Management

Jun 16, 2023, 4 minute read
What is Stardog Voicebox? Stardog Voicebox is a knowledge engineer powered by Large Language Models (LLM), other Generative AI, and autonomous agents to provide 3 core services.

  1. Knowledge Graph Question Answering to fully democratize enterprise analytics and data access. Voicebox answers ordinary language questions about the data in a Stardog Knowledge Graph or Knowledge Catalog without the need to write graph queries or connect a BI tool like Tableau.
  2. Knowledge Graph Data Modeling to reduce the cost of implementing your enterprise knowledge graph. Voicebox creates new knowledge graph data models from ordinary language inputs (i.e. prompts), CSV or other delimited files, or from any other Stardog Datasource, and it can be used iteratively together with Stardog Designer in all phases of the data model lifecycle: bootstrapping, iterative maintenance and extension, etc.
  3. Stardog Smart Help to reduce the cost of operating Stardog by empowering DevOps and IT. Voicebox answers plain language support questions about using, programming, or administering the Stardog EKG Platform.

These are the core capabilities of Voicebox 1.0. We have more planned in future releases including automated query tuning, dynamic, structured data extraction from enterprise documents via Virtual Graphs, smart entity linking, autonomous agents, and more.

Does Voicebox use ChatGPT or OpenAI? No. We’ve built our own infrastructure around LLMs which we’ve fine tuned with thousands of example queries and data models from a variety of public and Stardog proprietary datasets.

Is Voicebox available outside of Stardog Cloud? Not currently. Voicebox runs in Stardog Cloud, which is hosted in AWS.

Will my data be private? How will my data be used? Yes! Stardog Voicebox offers the same privacy and data security guarantees as Stardog Cloud; we will never share your data with a third party. For the exact details of how we use your data in Stardog Cloud, you can refer to the Stardog Cloud terms of service.

Will my data be used for training? Is there any way it can be exposed? No. The models used in Voicebox are fine tuned with public datasets, proprietary Stardog data collections, etc. All Voicebox usage is kept safely with your other logs in Stardog Cloud and isn’t used to fine tune models.

Is Voicebox a multi-tenant system? We will introduce opt-in fine tuning with your data in the future but even then your data will only be used to fine tune your specific instance of Voicebox and will never be used for any other purpose. Specifically, your data will never be used to fine tune another customer’s LLM and vice versa; your data will remain strictly within a physically discrete multi-tenant cloud environment.

What about LLM’s hallucinations? How can I trust Voicebox? Voicebox does not contain data from your or any other knowledge graph and we do not use LLM facts to answer questions in Voicebox. Voicebox only uses your Stardog-connected data to answer questions you pose via Voicebox. The answers Voicebox provides are not coming from an LLM; rather, the answers come from the Stardog knowledge graph that Voicebox questions on your behalf. This removes any opportunity for LLM hallucination in Stardog Voicebox.

Can Voicebox use the Stardog Inference Engine? What about Virtual Graphs? Yes, to both. Voicebox uses both statistical and logical AI to perform knowledge graph question answering, which is consistent with Stardog’s approach to Hybrid AI. Virtual Graphs and inference are key features of the knowledge graph and Voicebox is able to use both when answering questions.

When will Voicebox be available? We are planning a limited roll out to designated users in Summer 2023.

Can I use Voicebox stand-alone? Not at this time.

How does Voicebox work? We started with an exhaustive evaluation of a variety of foundational models. When we found the ones that work best for question answering and for data model creation, then we fine tuned those models. In parallel we’ve built a dynamic prompting approach, including evaluation framework, so that Voicebox can use prompt chaining and other prompting techniques to provide schema introspection, results summarization, and query generation. These chains are combined in different ways to provide the first three Voicebox services.

