Matthew Rothenberg

LLMs in the enterprise — what’s chat got to do with it?

Large Language Models (LLMs) are this season’s hot technology topic, but what will it take to establish their lasting success in the enterprise? That was the topic of a panel discussion hosted by Union.ai that brought together leading practitioners of AI and ML to consider LLMs’ prospects in the near and mid-term. 

The panel was moderated by Kubernetes guru Kelsey Hightower and featured 

High on the list of panelists’ concerns: Is chat really the “killer app” that will drive LLMs’ widespread use? 

Thanks to ChatGPT and other LLM-based platforms, chat functionality is probably the first capability most people think of when they consider these models. Nevertheless, the panelists took very different positions about chat’s appeal. 

“By far the biggest application we're seeing is some sort of chat with your data — like chat GPT, but it knows your data,” Chase said. Most enterprises are primarily applying chat internally to their own structured data, he said. When working with standardized documents, “There’s a pretty standard formula for inserting [data] into context and answering questions about it,” and the stakes are lower if the chatbot returns errors to internal stakeholders than to customers. 

However, Chase said, chat over unstructured data is evolving rapidly. “There are a lot of startups that are trying to solve this. … Another [application] I'm very bullish on is extraction of information from these documents. Maybe this isn't something that's done live — you might not ask it a question, but rather you have some preprocessing step where you extract information and put it in some type of SQL query where people can analyze it. 

“Those are two larger use cases where we see people getting started, and the formulas there are relatively standard. And I think those are two great starting points.”

Is there anybody out there? 

“I actually like chat with a real human!” Hightower replied. When it comes to customer service, he said, current chatbots can’t interpret the context of queries in a satisfying way. He offered the example of trying to change a flight on short notice: “‘I am literally stuck on vacation. No, I don't want to leave four days from now! You're lacking context in the discussion, Chatbot.’ ” 

According to Umare, the crucial gap is in the way humans make decisions. “I would not rely at the moment on LLMs to make decisions. I would rely on LLMs to help me in making decisions. That's a slight difference, but it’s an important difference. For example, he said, the statement, “I went to the hospital yesterday” can have a number of meanings (from visiting a sick friend to checking into the ER) that an LLM can’t interpret. “When you want decisions, you want precision.”

To be reliable decision-making tools, Umare said, LLMs will have to achieve that same sort of precision. “Whenever there's a left turn or right turn to be made, why did you make that decision? What was the set of input in my simulation? What did I see? And how was the observation done? And how was it accepted? 

“We have to have a trace because this is not a human making a decision. This is a machine making a decision. Every time it makes an imprecise decision, trust reduces.”

“I'm going to go ahead and take a slightly stronger stance,” Lindall said. “I think that chat is possibly the least useful application of LLMs, and the last one that is going to see wide adoption.

“I think ChatGPT was only really revolutionary in two ways,” Lindall continued. “One, the reinforcement learning through human feedback, I think, is really powerful. But it really captured people's imagination by showing them what these Large Language Models could do. It's very interpersonal that you can chat with this thing! 

“But in fact, the underlying models have been in production [in the] enterprise for a long time for a lot of tasks. And like Ketan was talking about, there are real consequences to exposing these chat models to users. And we really don't have the ability to control them that well, especially in an unregulated text output environment.”

Fine-tuning away hallucination

According to Singh, the pitfalls of current models can be addressed by fine-tuning a well-vetted data set. “General-purpose models are very good at giving more-generic answers, but when within the context of an enterprise, you need to have that proprietary data set on which it is fine-tuned,” such as LinkedIn member profiles. “Otherwise, the answers are probably full of hallucination, and you are also getting a lot of the biases and other things from the generic data set on which the model was trained."

Singh also considered the importance of fine-tuning to preserve the integrity and voice of the brand chatbots will represent. “These are going to become the voice of enterprises — co-pilots, speaking on behalf of your brand. And the more the distinction vanishes between whether a human or a chatbot is speaking, the higher the guardrails that need to be developed. 

“You will need to create your own unique data sets that you fine-tune; refine; make sure you are getting rid of all the other toxicity, hallucination and other things, before you can actually launch a product. You’ve got to have techniques and platforms and proprietary data sets to fine-tune these models.”

Dadhich emphasized the pivotal role of information retrieval to creating meaningful, relevant interactions via LLMs. “You have a set of documents, and do a retrieval to make sure you have the versioning correct. Let's say you have Version 1 of the document, and then you have  Version 2. To have an AI that can do question answering on top of it, you want Version 2 to be used instead of Version 1. That’s where information retrieval-based LLM models come into picture. 

“ChatGPT and some of the other usual LLM models don't support that,” he said. “But you can basically club the contents that you actually retrieved from the query first, and then use that together with the ChatGPT or another LLM. And that gives you an edge: Every new version of the document comes in, you really don't have to worry about fine-tuning the LLM. You can just keep using the same LLM.”

Meanwhile, on the hardware front … 

When it comes to the horsepower needed to drive LLMs, Hightower again looked to NVIDIA’s Dadhich. “I'm pretty sure that for any enterprise that's watching this right now, — you being from NVIDIA — there's only one question: ‘Where are the GPUs at?’ Why do they cost so much? And why is the GPU so important to this whole process?”

“It's basically an amalgamation of a lot of different things that come with the GPU itself,” Dadhich responded. “You have, let's say, a model running; instead of maybe a single processor doing the whole analysis of the data, you now have multiple threads. So instead of a single CPU, you can think of having 10,000 CPUs, all of them working in parallel and doing the same thing at the very same time. 

“So what ends up happening is that you are actually reducing latency,” he continued. “Instead of it taking 10,000 seconds, you're taking just a single second to do the same thing. And given that, you know, all these language models are actually quite compute-heavy, you can't really have an Intel CPU and ask it to do the same thing a GPU can do. 

“On top of the GPU, we also have the full ecosystem — the libraries and the software tools on top of these individual GPUs to make it easy for people to use it. Any person outside of a background with GPUs doesn’t really don't have to worry about the actual GPU code, because that requires a lot of expertise. So basically, you create all of these abstractions, which makes it easy for an end user to come up with a set of requirements” to use the parallel processing of GPUs. 

Summing it up

At the end of the panel, Hightower summarized the discussion by considering the current gap between chat and interaction between people. “When we think about humans, we are very, very advanced complex machines,” he said. “If you appreciate machine learning, you should really appreciate human learning, and context and experience. That context and experience we walk around with — well, that's our little personal models. And sometimes we choose to share those models with other people. One thing that we do is we use language to communicate: Verbal, physical, every interaction we typically have starts with some form of communication. 

“And now what we want to do is raise computers as advanced as we think they are! We've been interacting with them in mostly primitive ways, typing on the keyboard, having to read all the documentation before you can do anything useful. Now we're starting to understand the human interfaces, we use — vision, speech, all of these things — we want now to see applied to our machines. But we're also now asking these machines to make even more complex decisions than before. And decisions cover responsibility. 

“So I think we heard some really good advice,” he concluded. “You just don't want to go get a generic language model and start giving advice to your customer base. What's the point? That's the Race to Zero. The reason why you're in business is because of the model you actually have of the world. And your data should reflect that. That unique data set is what makes your company a lot more interesting than another company. Large Language Models are just another example of technology. But when you combine it with know-how, that's when you get something special.”

LLMs