Agents Course documentation
Multi-Agent Systems
Multi-Agent Systems
Multi-agent systems enable specialized agents to collaborate on complex tasks, improving modularity, scalability, and robustness. Instead of relying on a single agent, tasks are distributed among agents with distinct capabilities.
In smolagents, different agents can be combined to generate Python code, call external tools, perform web searches, and more. By orchestrating these agents, we can create powerful workflows.
A typical setup might include:
- A Manager Agent for task delegation
- A Code Interpreter Agent for code execution
- A Web Search Agent for information retrieval
The diagram below illustrates a simple multi-agent architecture where a Manager Agent coordinates a Code Interpreter Tool and a Web Search Agent, which in turn utilizes tools like the DuckDuckGoSearchTool
and VisitWebpageTool
to gather relevant information.
Multi-Agent Systems in Action
A multi-agent system consists of multiple specialized agents working together under the coordination of an Orchestrator Agent. This approach enables complex workflows by distributing tasks among agents with distinct roles.
For example, a Multi-Agent RAG system can integrate:
- A Web Agent for browsing the internet.
- A Retriever Agent for fetching information from knowledge bases.
- An Image Generation Agent for producing visuals.
All of these agents operate under an orchestrator that manages task delegation and interaction.
Solving a complex task with a multi-agent hierarchy
The reception is approaching! With your help, Alfred is now nearly finished with the preparations.
But now thereβs a problem: the Batmobile has disappeared. Alfred needs to find a replacement, and find it quickly.
Fortunately, a few biopics have been done on Bruce Wayneβs life, so maybe Alfred could get a car left behind on one of the movie sets, and re-engineer it up to modern standards, which certainly would include a full self-driving option.
But this could be anywhere in the filming locations around the world - which could be numerous.
So Alfred wants your help. Could you build an agent able to solve this task?
π Find all Batman filming locations in the world, calculate the time to transfer via boat to there, and represent them on a map, with a color varying by boat transfer time. Also represent some supercar factories with the same boat transfer time.
Letβs build this!
This example needs some additional packages, so letβs install them first:
pip install 'smolagents[litellm]' plotly geopandas shapely kaleido -q
We first make a tool to get the cargo plane transfer time.
import math
from typing import Optional, Tuple
from smolagents import tool
@tool
def calculate_cargo_travel_time(
origin_coords: Tuple[float, float],
destination_coords: Tuple[float, float],
cruising_speed_kmh: Optional[float] = 750.0, # Average speed for cargo planes
) -> float:
"""
Calculate the travel time for a cargo plane between two points on Earth using great-circle distance.
Args:
origin_coords: Tuple of (latitude, longitude) for the starting point
destination_coords: Tuple of (latitude, longitude) for the destination
cruising_speed_kmh: Optional cruising speed in km/h (defaults to 750 km/h for typical cargo planes)
Returns:
float: The estimated travel time in hours
Example:
>>> # Chicago (41.8781Β° N, 87.6298Β° W) to Sydney (33.8688Β° S, 151.2093Β° E)
>>> result = calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093))
"""
def to_radians(degrees: float) -> float:
return degrees * (math.pi / 180)
# Extract coordinates
lat1, lon1 = map(to_radians, origin_coords)
lat2, lon2 = map(to_radians, destination_coords)
# Earth's radius in kilometers
EARTH_RADIUS_KM = 6371.0
# Calculate great-circle distance using the haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = (
math.sin(dlat / 2) ** 2
+ math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
)
c = 2 * math.asin(math.sqrt(a))
distance = EARTH_RADIUS_KM * c
# Add 10% to account for non-direct routes and air traffic controls
actual_distance = distance * 1.1
# Calculate flight time
# Add 1 hour for takeoff and landing procedures
flight_time = (actual_distance / cruising_speed_kmh) + 1.0
# Format the results
return round(flight_time, 2)
print(calculate_cargo_travel_time((41.8781, -87.6298), (-33.8688, 151.2093)))
Setting up the agent
For the model provider, we use Together AI, one of the new inference providers on the Hub!
The GoogleSearchTool uses the Serper API to search the web, so this requires either having setup env variable SERPER_API_KEY
and passing provider="serpapi"
or having SERPER_API_KEY
and passing provider=serper
.
If you donβt have any Serp API provider setup, you can use DuckDuckGoSearchTool
but beware that it has a rate limit.
import os
from PIL import Image
from smolagents import CodeAgent, GoogleSearchTool, InferenceClientModel, VisitWebpageTool
model = InferenceClientModel(model_id="Qwen/Qwen2.5-Coder-32B-Instruct", provider="together")
We can start by creating a simple agent as a baseline to give us a simple report.
task = """Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128Β° N, 74.0060Β° W), and return them to me as a pandas dataframe.
Also give me some supercar factories with the same cargo plane transfer time."""
agent = CodeAgent(
model=model,
tools=[GoogleSearchTool("serper"), VisitWebpageTool(), calculate_cargo_travel_time],
additional_authorized_imports=["pandas"],
max_steps=20,
)
result = agent.run(task)
result
In our case, it generates this output:
| | Location | Travel Time to Gotham (hours) |
|--|------------------------------------------------------|------------------------------|
| 0 | Necropolis Cemetery, Glasgow, Scotland, UK | 8.60 |
| 1 | St. George's Hall, Liverpool, England, UK | 8.81 |
| 2 | Two Temple Place, London, England, UK | 9.17 |
| 3 | Wollaton Hall, Nottingham, England, UK | 9.00 |
| 4 | Knebworth House, Knebworth, Hertfordshire, UK | 9.15 |
| 5 | Acton Lane Power Station, Acton Lane, Acton, UK | 9.16 |
| 6 | Queensboro Bridge, New York City, USA | 1.01 |
| 7 | Wall Street, New York City, USA | 1.00 |
| 8 | Mehrangarh Fort, Jodhpur, Rajasthan, India | 18.34 |
| 9 | Turda Gorge, Turda, Romania | 11.89 |
| 10 | Chicago, USA | 2.68 |
| 11 | Hong Kong, China | 19.99 |
| 12 | Cardington Studios, Northamptonshire, UK | 9.10 |
| 13 | Warner Bros. Leavesden Studios, Hertfordshire, UK | 9.13 |
| 14 | Westwood, Los Angeles, CA, USA | 6.79 |
| 15 | Woking, UK (McLaren) | 9.13 |
We could already improve this a bit by throwing in some dedicated planning steps, and adding more prompting.
Planning steps allow the agent to think ahead and plan its next steps, which can be useful for more complex tasks.
agent.planning_interval = 4
detailed_report = agent.run(f"""
You're an expert analyst. You make comprehensive reports after visiting many websites.
Don't hesitate to search for many queries at once in a for loop.
For each data point that you find, visit the source url to confirm numbers.
{task}
""")
print(detailed_report)
detailed_report
In our case, it generates this output:
| | Location | Travel Time (hours) |
|--|--------------------------------------------------|---------------------|
| 0 | Bridge of Sighs, Glasgow Necropolis, Glasgow, UK | 8.6 |
| 1 | Wishart Street, Glasgow, Scotland, UK | 8.6 |
Thanks to these quick changes, we obtained a much more concise report by simply providing our agent a detailed prompt, and giving it planning capabilities!
The modelβs context window is quickly filling up. So if we ask our agent to combine the results of detailed search with another, it will be slower and quickly ramp up tokens and costs.
β‘οΈ We need to improve the structure of our system.
βοΈ Splitting the task between two agents
Multi-agent structures allow to separate memories between different sub-tasks, with two great benefits:
- Each agent is more focused on its core task, thus more performant
- Separating memories reduces the count of input tokens at each step, thus reducing latency and cost.
Letβs create a team with a dedicated web search agent, managed by another agent.
The manager agent should have plotting capabilities to write its final report: so let us give it access to additional imports, including plotly
, and geopandas
+ shapely
for spatial plotting.
model = InferenceClientModel(
"Qwen/Qwen2.5-Coder-32B-Instruct", provider="together", max_tokens=8096
)
web_agent = CodeAgent(
model=model,
tools=[
GoogleSearchTool(provider="serper"),
VisitWebpageTool(),
calculate_cargo_travel_time,
],
name="web_agent",
description="Browses the web to find information",
verbosity_level=0,
max_steps=10,
)
The manager agent will need to do some mental heavy lifting.
So we give it the stronger model DeepSeek-R1, and add a planning_interval
to the mix.
from smolagents.utils import encode_image_base64, make_image_url
from smolagents import OpenAIServerModel
def check_reasoning_and_plot(final_answer, agent_memory):
multimodal_model = OpenAIServerModel("gpt-4o", max_tokens=8096)
filepath = "saved_map.png"
assert os.path.exists(filepath), "Make sure to save the plot under saved_map.png!"
image = Image.open(filepath)
prompt = (
f"Here is a user-given task and the agent steps: {agent_memory.get_succinct_steps()}. Now here is the plot that was made."
"Please check that the reasoning process and plot are correct: do they correctly answer the given task?"
"First list reasons why yes/no, then write your final decision: PASS in caps lock if it is satisfactory, FAIL if it is not."
"Don't be harsh: if the plot mostly solves the task, it should pass."
"To pass, a plot should be made using px.scatter_map and not any other method (scatter_map looks nicer)."
)
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": prompt,
},
{
"type": "image_url",
"image_url": {"url": make_image_url(encode_image_base64(image))},
},
],
}
]
output = multimodal_model(messages).content
print("Feedback: ", output)
if "FAIL" in output:
raise Exception(output)
return True
manager_agent = CodeAgent(
model=InferenceClientModel("deepseek-ai/DeepSeek-R1", provider="together", max_tokens=8096),
tools=[calculate_cargo_travel_time],
managed_agents=[web_agent],
additional_authorized_imports=[
"geopandas",
"plotly",
"shapely",
"json",
"pandas",
"numpy",
],
planning_interval=5,
verbosity_level=2,
final_answer_checks=[check_reasoning_and_plot],
max_steps=15,
)
Let us inspect what this team looks like:
manager_agent.visualize()
This will generate something like this, helping us understand the structure and relationship between agents and tools used:
CodeAgent | deepseek-ai/DeepSeek-R1
βββ β
Authorized imports: ['geopandas', 'plotly', 'shapely', 'json', 'pandas', 'numpy']
βββ π οΈ Tools:
β βββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββββββββββββββ
β β Name β Description β Arguments β
β β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β β calculate_cargo_travel_time β Calculate the travel time for a cargo β origin_coords (`array`): Tuple of β
β β β plane between two points on Earth β (latitude, longitude) for the β
β β β using great-circle distance. β starting point β
β β β β destination_coords (`array`): Tuple β
β β β β of (latitude, longitude) for the β
β β β β destination β
β β β β cruising_speed_kmh (`number`): β
β β β β Optional cruising speed in km/h β
β β β β (defaults to 750 km/h for typical β
β β β β cargo planes) β
β β final_answer β Provides a final answer to the given β answer (`any`): The final answer to β
β β β problem. β the problem β
β βββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββ
βββ π€ Managed agents:
βββ web_agent | CodeAgent | Qwen/Qwen2.5-Coder-32B-Instruct
βββ β
Authorized imports: []
βββ π Description: Browses the web to find information
βββ π οΈ Tools:
βββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββββββββββββββ
β Name β Description β Arguments β
β‘ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ©
β web_search β Performs a google web search for β query (`string`): The search β
β β your query then returns a string β query to perform. β
β β of the top search results. β filter_year (`integer`): β
β β β Optionally restrict results to a β
β β β certain year β
β visit_webpage β Visits a webpage at the given url β url (`string`): The url of the β
β β and reads its content as a β webpage to visit. β
β β markdown string. Use this to β β
β β browse webpages. β β
β calculate_cargo_travel_time β Calculate the travel time for a β origin_coords (`array`): Tuple of β
β β cargo plane between two points on β (latitude, longitude) for the β
β β Earth using great-circle β starting point β
β β distance. β destination_coords (`array`): β
β β β Tuple of (latitude, longitude) β
β β β for the destination β
β β β cruising_speed_kmh (`number`): β
β β β Optional cruising speed in km/h β
β β β (defaults to 750 km/h for typical β
β β β cargo planes) β
β final_answer β Provides a final answer to the β answer (`any`): The final answer β
β β given problem. β to the problem β
βββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ
manager_agent.run("""
Find all Batman filming locations in the world, calculate the time to transfer via cargo plane to here (we're in Gotham, 40.7128Β° N, 74.0060Β° W).
Also give me some supercar factories with the same cargo plane transfer time. You need at least 6 points in total.
Represent this as spatial map of the world, with the locations represented as scatter points with a color that depends on the travel time, and save it to saved_map.png!
Here's an example of how to plot and return a map:
import plotly.express as px
df = px.data.carshare()
fig = px.scatter_map(df, lat="centroid_lat", lon="centroid_lon", text="name", color="peak_hour", size=100,
color_continuous_scale=px.colors.sequential.Magma, size_max=15, zoom=1)
fig.show()
fig.write_image("saved_image.png")
final_answer(fig)
Never try to process strings using code: when you have a string to read, just print it and you'll see it.
""")
I donβt know how that went in your run, but in mine, the manager agent skilfully divided tasks given to the web agent in 1. Search for Batman filming locations
, then 2. Find supercar factories
, before aggregating the lists and plotting the map.
Letβs see what the map looks like by inspecting it directly from the agent state:
manager_agent.python_executor.state["fig"]
This will output the map:
Resources
- Multi-Agent Systems β Overview of multi-agent systems.
- What is Agentic RAG? β Introduction to Agentic RAG.
- Multi-Agent RAG System π€π€π€ Recipe β Step-by-step guide to building a multi-agent RAG system.