Large Language Models (LLMs) have ushered in a new era of software development, offering unparalleled capabilities for natural language processing and understanding. However, harnessing the power of LLMs effectively requires a well-defined architecture, given their unique characteristics. In this article, we delve into the emerging architectures for LLM applications, providing insights into the systems, tools, and design patterns used by AI startups and tech companies. We've drawn inspiration from discussions with industry experts, and we'll also introduce some additional insights and considerations. Also, if you would like to know how to create LLM apps make sure to check our blog.
Before we dive into the architecture, let's take a look at the components that constitute the LLM app stack. This stack serves as a foundation for building LLM-powered applications:
This stack forms the backbone of LLM-driven applications and provides the necessary infrastructure for their development and deployment.
One of the foundational design patterns for building with LLMs is "in-context learning." This approach involves using LLMs without extensive fine-tuning and instead controlling their behavior through clever prompts and conditioning on contextual data. Let's explore this pattern in more detail.
The in-context learning workflow can be divided into three key stages:
In-context learning simplifies the process of building LLM applications by reducing the need for extensive fine-tuning. It transforms AI development into a data engineering problem that's accessible to startups and established companies. In-context learning is particularly beneficial for applications with relatively small datasets, as it can adapt to new data in near real time.
However, it's crucial to address a common question: What happens when the underlying model's context window expands? While this is an active area of research, it presents challenges related to cost and time of inference. Even with linear scaling, larger context windows could lead to significant expenses.
To explore in-context learning further, refer to resources in the AI community, especially the "Practical guides to building with LLMs" section.
Contextual data is a critical component of LLM applications, encompassing text documents, PDFs, structured data like CSV or SQL tables, and more. The way developers handle this data varies, but there are some common approaches:
Effectively prompting LLMs and incorporating contextual data require thoughtful strategies. While simple prompts may suffice for initial experiments, more advanced techniques become essential for production-quality results. Some strategies include:
Choosing the right LLM model is crucial for application performance. OpenAI, with models like gpt-4 and gpt-4-32k, is a popular starting point. However, as applications scale, developers explore different options:
LLM applications also require infrastructure and hosting solutions:
While not explicitly part of the reference architecture, AI agents play a pivotal role in many LLM applications. These agents, like AutoGPT, have the potential to bring advanced reasoning, tool usage, and learning capabilities to LLM-powered apps. However, they are still in the experimental phase, with challenges related to reliability and task completion.
The emergence of pre-trained AI models, particularly LLMs, has revolutionized software development. The architectures and patterns outlined in this article are just the beginning. As the field evolves, we can expect to see changes, especially as the context window of models expands and AI agents become more sophisticated.
Pre-trained AI models have democratized AI development, enabling individual developers to create powerful applications quickly. As technology continues to advance, these architectures will evolve, and new reference architectures will emerge to address changing requirements.
If you have feedback or suggestions regarding this article, please reach out. Also you can also check our blog on LLMs vs LangChain. The world of LLM applications is dynamic and constantly evolving, and collaboration and knowledge-sharing are essential for its continued growth and innovation.
LLM applications are those powered by Large Language Models. They are considered emerging because they represent a relatively new and transformative approach to building software, especially in terms of natural language understanding and processing.
In-context learning involves using LLMs off the shelf and controlling their behavior through clever prompts and conditioning on contextual data. It simplifies development by reducing the need for extensive fine-tuning and making AI development accessible to a wider range of developers.
The LLM app stack comprises various components, including data pipelines, embedding models, vector databases, orchestration tools, and more. These components are vital as they provide the infrastructure and tools necessary for collecting, processing, and interacting with LLMs effectively.
Contextual data, such as text documents and structured formats, is crucial in LLM applications. Developers typically use tools like Databricks, Airflow, and vector databases like Pinecone to handle this data efficiently.
When selecting LLM models, developers should consider factors like cost, inference speed, and context window size. Optimizations may include switching to cost-effective models, exploring proprietary vendors, and implementing operational tools like caching and logging for enhanced performance and reliability.
Flat no 2B, Fountain Head Apt, opp Karishma Soci. Gate no 2, Above Jayashree Food Mall, Kothrud, Pune, Maharashtra 38