Using RAG for Question Answering in Regulated Industries is a Bad Idea

Kendall Clark

Feb 24, 2024, 8 minute read

Using RAG only as the basis of serious GenAI apps that run inside of enterprises in regulated industries is a very, very dumb idea. Maybe it’s impolite to use the word “dumb” but I mean it quite seriously. Also, for the record, ideas are dumb; people are un- or misinformed but never dumb. And since I’m defining terms:

RAG is a way of supplementing the output of an LLM with the output of a vector database in which you’ve put (in this context) some combination of user-derived session data (i.e., working memory) and some organizational background or context.
In what follows when I say “RAG” I really mean “RAG only”, which is a conceptually important but rhetorically tedious distinction.
Semantic Parsing is the primary alternative to RAG and really the only suitable thing to do in a regulated industry. I’ll define this further below.
“Dumb” from the Latin for “you can’t possibly be serious”.

Why RAG is a Terrible Idea in Regulated Industries

There are lots of fun RAG uses in B2C:

virtual travel agents
virtual assistants of various types
virtual therapists
other kinds of virtualization of the service economy

But I care about B2B and, more specifically, the impact and use of GenAI and KG on heavily regulated industries and companies of $1B or more revenue in financial services, life sciences, and manufacturing.

TLDR: The current rage for RAG in LLMs and GenAI is a very dumb idea for regulated enterprises. DO NOT DO THIS.

There are three reasons why RAG is a dumb idea in regulated industries.

1. Subtle Hallucinations versus Systemic Uncertainty

In every endeavor, regulated, commercial, or otherwise, we use data to manage or eliminate uncertainly. Our uses of data ought not increase net uncertainty and that’s the first reason not to RAG in regulated industries.

Regulated RAG means a pernicious choice for users and the org itself:

Either you suffer the problems of subtle hallucinations misleading people in high stakes contexts or you do that for awhile with the result that you’ve systemically undercut the epistemic value of GenAI by creating nearly undetectable confabulations and thereby increased net uncertainty about the epistemic value of a strategic investment in GenAI.

The technical term for this situation is an unholy shit show. Hey, I just work here, I don’t coin the technical terms! The real problem is threefold:

LLM hallucinations are a function of the probabilistic nature of LLM and there’s no good reason to believe they can be eliminated; and even if there is a good reason to believe that, until they’re actually eliminated, Regulated RAG is a juice profoundly un-worth the squeeze.
Because of #1, the hallucinations are subtle which means very difficult to detect, thereby increasing their cost by increasing likelihood they’ll be believed, that is, undetected.
Regulated industries are about policies, that is, the subtle interaction of facts and externally-imposed rules. The fact-and-rule world of regulated industries is extraordinarily complex, which is why GenAI can be revolutionary in the first place but it’s also why RAG is such a bad idea:
1. In epistemic “high load” environments, people need help because there’s so much they don’t know or understand.
2. Unavoidably, subtle mistruths and confabulations are very hard to detect in these environments.
3. These two structural forces interact very badly.

On balance it would not be prudent to use GenAI-from-RAG in these environments. And about prudential standards…

2. Standards of Care

What’s the big deal, right? So occasionally the GenAI box of magic beans gets an answer wrong. People get answers wrong often. Yes but in a regulated industry the Sovereign obligates people to exhibit standards of care, that is, regulated industries are ones in which the Sovereign demands people do their very best to get answers right or be able to show they’ve taken all reasonable precautions to get the answer right.

That is, there has to be a showing that the regulated biz spent $$ to mitigate downside risk.

The result is that most regulated industries overspend on getting things right because they’re required to; they’re not required to overspend but they overspend because of (1) uncertainty, (2) complexity, and (3) external-imposition of high standard of care. For example, in a globally significant bank, the regulated portion overspends to prudently follow standards of care and that effectively bids up the price of everything that the bank does. All of which is horribly unproductive.

RAG makes it all worse because if the box of magic beans that’s intended to make all of this cheaper and also simultaneously better sometimes just randomly makes up very plausible nonsense and you know that this happens but not when, then it’s not clear you’ve met standards of care by using this box of magic beans. And it’s not clear where you’ve gotten things wrong because of it and that tends to lead to bad outcomes when the Sovereign’s agents come to hold you to account for what you’ve done to satisfy their requirements.

3. Sometimes You Have to Get It Right

It’s easy to be cynical about regulation. But forget politics and the Sovereign for a minute. What if you’re a drug scientist and you just really want to help children who have eye cancer not… to have eye cancer? A more laudable goal I cannot imagine and I am down to help them eradicate childhood eye cancer! Or maybe the question is about whether you should pull that jet engine now or after the next trip and then that unseasonably aggressive jet stream causes the metal fatigue to…you get the idea.

That is, what if you have the courage in our strange world to want to get the answer right because it matters? RAG is bad because regulated industries are the ones where getting it right really matters to human flourishing and all of us non-sociopaths should care about that. A lot.

Okay so what’s the alternative?

What if there were a Regulated RAG alternative?

So let’s push GenAI out of the enterprise? No way, I really believe in the power of this stuff, but like all boxes of magic beans you’ve got to be smart about how you use them.

We can speculate about why RAG is all the rage—IMO, it’s one part A16Z being wrong early on cementing RAG and vector databases in GenAI reference architecture, and it’s two parts “the relational data model is still dominant but very wrong for GenAI”—but happily there is a perfectly sane alternative to RAG in regulated industries.

It’s called Semantic Parsing: rather than trusting anything an LLM says or showing a user anything an LLM says directly, instead you center a Knowledge Graph and use LLM to (1) figure out what the user is asking of the data; (2) turn that algorithmically-derived expression of human intent into a structured query against the KG; (3) take the KG’s answer, which is based on (a) known, (b) trusted, (c) lineaged, (d) timely data sources, and decorate or embellish it with other stuff; and, finally, (4) put this dialogue into a big context window and robust working memory so that the user has context-rich interactions with database-resident enterprise data.

That’s just better in every way including in the failure mode. You see “standards of care” is really all about managing the failure mode. Let’s compare RAG’s failure mode to SP’s:

RAG’s failure mode is subtle hallucination where users are confused by untruths and don’t have much chance for correction or repair. In the worst case these untruths are undetected, significant, and propagate widely, while also being materially adverse.

As explained above, if the hallucinations were grotesque this failure mode would be annoying, but it wouldn’t be pernicious. “Oh look,” the user would think, “the box of magic beans is being incredibly stupid again, let’s just agree to ignore it for a minute.”

Semantic Parsing’s failure mode is a GenAI app saying to a user “I can’t answer that question” and the user is at worst annoyed but epistemically no worse than before.

SP fails by failing at #1, that is, the LLM either cannot turn user input into a structured query, in which case nothing bad happens. Or the LLM turns user input into the wrong query relative to the person’s intent but the answer to that ‘wrong’ query is still correct with respect to the data. But that failure mode is exactly the same as what happens every day when some BI tool or dashboard app is giving the right answer to the wrong query. In short, someone notices and fixes it.

All the benefits, none of the downsides is a clear win

Semantic Parsing dominates RAG in the GenAI solution space inside regulated industries in the game theoretic sense. It is always a better choice in every game. The smart money—Databricks, Snowflake, SalesForce, Microsoft, etc—is on SP and is only using RAG for low-stakes contexts (customer support of various types) where an occasional confabulation is kinda expected or banal.

Be smart and place your GenAI regulated industry bets against RAG and for Semantic Parsing.

Back to all posts

download our free e-guide

Knowledge Graphs 101

How to Overcome a Major Enterprise Liability and Unleash Massive Potential

Download for free