Introduction to Agentic AI
Table of Contents
Introduction
Unless you are living in a cave, you must have heard about or used large language models (LLMs) like ChatGPT, Claude, and Gemini. LLMs are good in many scenarios like:
- natural language understanding and generating
- reasoning over a large amount of information
- generating code and debugging
- knowledge extraction and summarization
However, at their core, they are just a system that takes text (and sometimes images, video, and audio) as input and generates text (and other formats) as output. They lack the ability to interact with the environment, fetch information from the web, or access external databases. Moreover, they are limited to the information on which they are trained and can not fetch the latest information. Techniques, like retrieval-augmented generation can solve some of these problems, but, unless LLMs are not able to take actions based on the information they have, they will always be limited.
An LLM can only take input in a specific format and return output in another specific format. It can not take action.
Agents are a way to overcome these limitations. In this article, we will discuss what agents are, how they work, the benefits of using agents and some applications of agentic AI. This article is the first part of a series of articles on agentic AI. In the next article, we will discuss how to create your own agent using smolagents
. Here is the link to next article.
What Exactly are Agents?
There is no universal definition of an agent. In fact, it is even debatable whether there is a hard line that separates a system from not being an agent to being an agent. To quote Andrew NG1:
Rather than having to choose whether or not something is an agent in a binary way, I thought, it would be more useful to think of systems as being agent-like to different degrees. Unlike the noun “agent,” the adjective “agentic” allows us to contemplate such systems and include all of them in this growing movement.
In this spirit, instead of asking whether a system can be called an agent or not, we can ask how agentic a system is. There are different levels of "agenticness". A system is more "agentic" if, by use of an LLM, it is able to complete a task more autonomously. For example, a system that lets you book a flight between two cities is more agentic than a system that just gives you the list of available flights. Nevertheless, we must have some definition of an agent to work with. This definition given by Harrison Chase, cofounder of Langchain2 is a good starting point:
An AI agent is a system that uses an LLM to decide the control flow of an application.
This definition gives us a concrete way to think about agents. An agent is a system that uses an LLM to decide the control flow of an application. This means that an agent can use an LLM to generate code, call functions, and make decisions based on the information it has. This is in contrast to a system that uses an LLM to generate text or code but does not use the LLM to decide the control flow of the application. Keep in mind that when we talk about agents in this article, we are talking about systems that use an LLM to decide the control flow of an application.
An agent is a combination of an LLM and tools that can be used to complete a task. The LLM decides what action to take and when to stop based on the state.
Various Levels of Agenticness
Based on how much control an LLM has over the workflow of your application, we can categorize the system into different levels of agenticness. Here are some of the levels of agency:
- Level 0: Non-Agentic: These are systems that use an LLM to generate text or code but do not have ability to take actions based on the information they have. For example, chat bots like ChatGPT, code generation tools like OpenAI Copilot etc.
- Level 1: Router: Here, you have a predefined workflow and the only agency that an LLM has is to the route your workflow will take. For example, a system that takes a user query decides in which category the query falls, and calls the appropriate function.
- Level 2: Tool User: In this case, the LLM can call a predefined set of tools to perform tasks. The LLM is responsible for figuring out which tool to use and what parameters to be passed to the tool. The system terminates after one step. An example will be a system that calls a weather API to get the weather information using the user query.
- Level 3: Multi-Step Agent: Here, the LLM can call the tools multiple number of time to complete a task. The ending condition of the task is also determined by the LLM. An example will be a system that can web search and scrape a website to get the information the user asked for.
- Level 4: Multi-Agent: This is the final level of agenticness. In this level, there are multiple agents and one agentic workflow can call a different agent to complete a task. An example will be a system that can book a flight for you. The system can call a weather agent to check the weather of the destination, a flight agent to book the flight, a hotel agent to book the hotel etc.
Agency Level | Description | How that’s called | Example Pattern |
---|---|---|---|
☆☆☆ | LLM output has no impact on program flow | Simple Processor | process_llm_output(llm_response) |
★☆☆ | LLM output determines an if/else switch | Router | if llm_decision(): path_a() else: path_b() |
★★☆ | LLM output determines function execution | Tool Caller | run_function(llm_chosen_tool, llm_chosen_args) |
★★★ | LLM output controls iteration and program continuation | Multi-step Agent | while llm_should_continue(): execute_next_step() |
★★★ | One agentic workflow can start another agentic workflow | Multi-Agent | if llm_trigger(): execute_agent() |
Benefits of Agentic AI
This section outlines some of the benefits of using agents.
Providing Extra Information
Since LLMs only have the data they were trained on, their knowledge is restricted to the data that was available during their training period and does not include the most recent advancements or real-time changes. This restriction makes it more difficult for LLMs to tailor answers according to the user's preferences or particular needs. Using agents provides a powerful way to overcome these constraints. Agents can be programmed to interface with APIs, external databases, or other real-time systems, allowing the AI to retrieve fresh and relevant data. The agent can also provide relevant information regarding individual user needs that can result in a more interactive, accurate, and personalized experience.
Best of Both World
When creating a workflow, we have two options:
- Use the traditional programming approach to create a structured workflow. This makes the system very structured and easy to understand. However, it also makes the system very rigid and can not handle all the edge cases.
- The second approach is to give an LLM the task and let it complete the task as it sees fit. This makes the system very flexible but suffers from hallucinations and incomplete information.
Using an agentic system is a middle approach and gives the best of both worlds. This hybrid approach makes the AI to be both intuitive and precise. Agents can autonomously perform tasks leveraging the context and reasoning capabilities of LLMs, adapting to new data or dynamic environments easily. At the same time, most parts of the system rely on traditional programming, making it deterministic and more secure.
Scalability
Even the state-of-the-art LLMs fail when working on complex problems that require multiple planning steps, making them unscalable. The accuracy of the LLM decreases with the number of steps, primarily because errors from earlier steps can accumulate, leading to compounding inaccuracies. Using agents offers an effective way to address this challenge by trading off some degree of latency for improved accuracy. Agents, allow the LLM to decompose the problem into smaller, more manageable steps, solving each step individually and iteratively refining the process. This results in a more accurate and efficient solution to complex problems. This type of system can easily be scaled to handle more complex problems by adding more agents or tools.
Workflow of a Typical Agent
This section outlines the workflow of a typical agentic AI system. An agent consists of two main components:
- LLM: The LLM is the main component of the agent. It is used to generate code, call tools, and take actions based on the information it has.
- Tools: Tools are functions that can be called by the LLM. They can be used to perform tasks like executing LLM-generated code, reading and writing files, and calling APIs.
Apart from these, the system has access to a state that can be read/written by the LLM. The state can be used to store information that can be used by the LLM to make decisions like what actions to take next and when to stop the task. To solve a task, the agent uses the ReAct framework 4. The React framework is based on two steps: Reason and Action. At each step, the system reasons over the current state and takes action based on the reasoning.
The following flowchart shows a typical agentic workflow:
The agent starts when a human gives it a prompt. The LLM utilizes its reasoning capabilities and the current state to take action (like calling a tool at its disposal). The action results in a change in the state and the LLM generates a response based on the new state. This process continues until the LLM decides that the task is completed or a pre-defined number of steps have been taken. At any step, the LLM can ask for clarification, feedback, or more information from the human. Humans can also intervene at any step to end the task.
An example of an agent solving a mathematical problem using tools. Animation Source: HuggingFace3
Steps for Creating a Simple Agent
Suppose you have a task that you want to solve using an agent. Here are the minimal steps you need to follow to create an agent:
- Break down the task: The first step is to break down the task into smaller sub-tasks. Each sub-task should be simple enough that it can be solved by an LLM in conjunction with a tool or two.
- Create tools: Create tools that can be used by the LLM to solve the sub-tasks. These tools are just functions that can be called by the LLM.
- Load the LLM: Load an LLM that can be used to generate code and call the tools. LLMs like ChatGPT, Claude, and Gemini have built-in capabilities to call tools.
- State Management: Create a state that can be read/written by the LLM. You must implement a way to read and update the state as required by the LLM.
- Run the ReAct Loop: Run the ReAct loop.
Note that these tasks are not trivial to code. Fortunately, you do not need to start from scratch. There are libraries like langchain
, smolagents
, pydantic-ai
, crewAI
etc. implement all the necessary tools and classes to create an agent.
Applications
Applications of agentic AI are vast and can be used in many fields. We have listed some of the applications below:
Software Development
- Code Writing: LLMs can generate the code and test cases, an agent can then try to execute the code and provide feedback to the LLM. An example will be to convert a codebase from one language to another.
- Automated Testing: The system can generate test cases based on the code base that can be run.
- Refactoring: Refactoring the codebase to improve the code quality.
Data Science
- Automated Data Preparation: Use AI agents to clean, transform, and prepare data for analysis.
- Exploratory Data Analysis: Use AI agents to generate plots, summaries, and insights from the data.
- Model Selection and Optimization: Provided with a dataset, an AI agent can automatically select the best model and optimize it.
- Data Conversation: Enable conversational analytics where users can query data in natural language.
Customer Service
- Intelligent Virtual Assistants: Agentic AI-powered chatbots or voice assistants can autonomously handle customer queries, troubleshoot issues, and escalate complex cases.
- Automated Feedback Collection and Analysis: Instead of manual surveys, AI agents can collect feedback from customers and analyze it to identify trends and insights.
HR Assistant
- Recruitment Automation: Most of the recruitment process can be automated using agentic AI.
- Onboarding New Employees: AI agents can help in onboarding new employees by providing them with the necessary information and training.
Challenges
Here are some challenges associated with agentic AI systems:
Complexity in Design and Maintenance: Developing and maintaining agentic AI systems is not easy and requires complex workflows, robust coordination mechanisms, and extensive testing.
Error Propagation: Agentic AI systems often rely on multi-step processes or inter-agent communication. This means that errors in one step or agent can propagate through the workflow, resulting in an incorrect or incomplete solution.
Increased Latency: The sequential nature of agentic systems, where multiple steps or agents are involved, leads to increased latency compared to single-step LLM-based systems.
Tool and API Integration: Integrating various tools, APIs, or agents into the system can be challenging, and requires careful design and testing.
Resource Intensiveness: Using a single LLM in itself is very resource-intensive. Agentic AI systems require multiple LLMs, tools, and agents, making them even more resource-intensive.
Decision-Making Uncertainty: Since the LLM itself is responsible for decision-making, there may be uncertainty or ambiguity in the actions taken by the system, leading to suboptimal or incorrect outcomes.
Security Risks: Agentic systems often interact with external APIs, databases, or agents, creating vulnerabilities that could be exploited for unauthorized access or malicious actions. Note to mention the security risks associated with using LLMs generated code.
Conclusion
In this article, we discussed what agents are, how they work, the benefits of using agents, and some applications of agentic AI. We also discussed the workflow of a typical agent and how to create a simple agent. We briefly touched on the topic of challenges involved in creating and running an agentic system. We hope this article has given you a better understanding of agentic AI and how it can be used to solve complex problems. If you want to learn how to create your own agent using Python and smolagents, hop on to the next article.
Sources
- Building effective agents (Anthorpic)
- smolagents documentation (HuggingFace)
- Agentic AI: 4 reasons why it’s the next big thing in AI research (IBM)
- What is agentic AI? A beginner's guide (sendbird)
- The Batch Weekly Issues 253 (Deeplearning.AI)
- What is an AI agent? (Langchain)
- Function calling (OpenAI)