LLM Application Architecture

There are some additional considerations for building LLM powered applications. You'll require several key components to create end-to-end solutions for LLM apps. On a high level, these include:

llm-apps-typical-architecture

Infrastructure Layer: This is the foundational layer providing compute, storage, and networking for serving LLMs and hosting application components. Options include on-premises infrastructure or cloud services with on-demand, pay-as-you-go models.
LLM Models: These are essential components of the architecture, which can be foundation models or specialized models adapted to specific tasks. The deployment should consider the need for real-time or near-real-time interactions.
Information Sources: For retrieval-augmented generation, external sources of information might be necessary. This can help improve the relevance and accuracy of the LLM outputs.
Generated Outputs & Feedback: The application should have the capacity to store user interactions for enhancing the model’s context window. Additionally, collecting user feedback can contribute to fine-tuning and evaluating the model's performance.
LLM Tools & Frameworks: Tools like LangChain provide libraries to implement prompting techniques, while model hubs facilitate the management and sharing of models across applications.
Application Interfaces: This layer includes user interfaces like websites and APIs, along with necessary security components for user interaction.
Users and Systems: These are the consumers of the LLM-powered applications, interacting with the entire stack.

Other considerations include:

Model Optimization: Techniques such as distillation, quantization, and pruning are used to optimize models for inference, reducing hardware resource needs.
Prompt Engineering: Structured prompts and connections to external data sources can enhance LLM performance in deployment.
Frameworks: Tools like LangChain enable rapid development, deployment, and testing of LLM-powered applications.
Reinforcement Learning with Human Feedback (RLHF): For aligning models with human preferences, and optimizing models for safer production use.

This architecture demonstrates the multifaceted aspects of building LLM apps beyond just the model itself, including things such as infrastructure, tools, interfaces, and continuous improvement through feedback and optimization.

Resources:

Who Owns the Generative AI Platform? (opens in a new tab): This article examines the market dynamics and business models of generative AI.

Embedding Models & Retrieval LLM Powered Autonomous Agents