Inference Challenges | Shav Vimalendiran

Inference Challenges in LLMs After Training

Knowledge Cut-Off

The internal knowledge held by an LLM cut-offs at the moment of pre-training. In other words, the LLM becomes outdated due to the fact that the dataset is of finite quantity and has been collected up to a certain time period.

For example, if we trained a model in early 2022 and ask Who is the British Prime Minister?, it is likely to answer Boris Johnson. Johnson left office in late 2022 but the model has no knowledge of this since the event occurred after its training.

Below is an example from the current (September 25, 2023) version of ChatGPT (GPT 3.5). Notice the disclaimer added to the completion:

chatgpt-training-cutoff

Complex Math

LLMs also tend to perform poorly on complex math problems. If we ask the model to behave like a calculator, it may get the answer wrong depending on the difficulty of the problem. For example, if we ask the model What is 40366 / 439?, it may generate an answer like 92.549. This is close to the actual answer (91.949) but still incorrect.

This is because LLMs do not actually carry out mathematical operations and are still just trying to predict the next-best token based on their training. It's not necessary that the next-best token matches the correct answer.

Below is an example from the current (September 25, 2023) version of ChatGPT (GPT 3.5).

$chatgpt-math-incorrect$

Hallucination

LLMs have a tendency to generate text even when they don't know the answer to a problem, called hallucination. For example, we can ask the LLM What is a Martian Dunetree? and it might respond A Martian Dunetree is a type of extraterrestrial plant found on Mars. Despite there being no evidence of life on Mars, the model is happy to respond with confidence. Below is an example from an old version of ChatGPT (GPT 3.5).

chatgpt-hallucination

The article does not exist and in fact, even if it did GPT 3.5 has no way of accessing the link to know what the article says. Despite this, it happily generates a summary with confidence.

Solving Inference Challenges

To solve these issues, we need to connect the LLM to external data sources and applications.

We need to do a bit more work to connect an LLM to external components and fully integrate everything for deployment within our application. The application must manage the passing of user input to the LLM, as well as the return of completions from the LLM. This is often done through some type of orchestration library.

Examples: LangChain (opens in a new tab), Haystack (opens in a new tab), LlamaIndex (opens in a new tab).

llm-orchestration-library

This layer can enable some powerful technologies that augment and enhance the performance of the LLM at runtime by providing access to external data sources or connecting to existing APIs of other applications. One implementation of this is LangChain.

Model Optimization for Deployment Retrieval-Augmented Generation (RAG)