Agents
Smolagents is an experimental API which is subject to change at any time. Results returned by the agents can vary as the APIs or underlying models are prone to change.
To learn more about agents and tools make sure to read the introductory guide. This page contains the API docs for the underlying classes.
Agents
Our agents inherit from MultiStepAgent, which means they can act in multiple steps, each step consisting of one thought, then one tool call and execution. Read more in this conceptual guide.
We provide two types of agents, based on the main Agent
class.
- CodeAgent is the default agent, it writes its tool calls in Python code.
- ToolCallingAgent writes its tool calls in JSON.
Both require arguments model
and list of tools tools
at initialization.
Classes of agents
class smolagents.MultiStepAgent
< source >( tools: typing.List[smolagents.tools.Tool] model: typing.Callable[[typing.List[typing.Dict[str, str]]], str] system_prompt: typing.Optional[str] = None tool_description_template: typing.Optional[str] = None max_steps: int = 6 tool_parser: typing.Optional[typing.Callable] = None add_base_tools: bool = False verbosity_level: int = 1 grammar: typing.Optional[typing.Dict[str, str]] = None managed_agents: typing.Optional[typing.List] = None step_callbacks: typing.Optional[typing.List[typing.Callable]] = None planning_interval: typing.Optional[int] = None )
Agent class that solves the given task step by step, using the ReAct framework: While the objective is not reached, the agent will perform a cycle of action (given by the LLM) and observation (obtained from the environment).
Runs the agent in direct mode, returning outputs only at the end: should be launched only in the run
method.
execute_tool_call
< source >( tool_name: str arguments: typing.Union[typing.Dict[str, str], str] )
Execute tool with the provided input and returns the result. This method replaces arguments with the actual values from the state if they refer to state variables.
extract_action
< source >( llm_output: str split_token: str )
Parse action from the LLM output
planning_step
< source >( task is_first_step: bool step: int )
Used periodically by the agent to plan the next steps to reach the objective.
This method provides a final answer to the task, based on the logs of the agent’s interactions.
run
< source >( task: str stream: bool = False reset: bool = True single_step: bool = False additional_args: typing.Optional[typing.Dict] = None )
Parameters
- task (
str
) — The task to perform. - stream (
bool
) — Whether to run in a streaming way. - reset (
bool
) — Whether to reset the conversation or keep it going from previous run. - single_step (
bool
) — Whether to run the agent in one-shot fashion. - additional_args (
dict
) — Any other variables that you want to pass to the agent run, for instance images or dataframes. Give them clear names!
Runs the agent for the given task.
To be implemented in children classes. Should return either None if the step is not final.
Runs the agent in streaming mode, yielding steps as they are executed: should be launched only in the run
method.
Reads past llm_outputs, actions, and observations or errors from the logs into a series of messages that can be used as input to the LLM.
class smolagents.CodeAgent
< source >( tools: typing.List[smolagents.tools.Tool] model: typing.Callable system_prompt: typing.Optional[str] = None grammar: typing.Optional[typing.Dict[str, str]] = None additional_authorized_imports: typing.Optional[typing.List[str]] = None planning_interval: typing.Optional[int] = None use_e2b_executor: bool = False max_print_outputs_length: typing.Optional[int] = None **kwargs )
In this agent, the tool calls will be formulated by the LLM in code format, then parsed and executed.
Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. Returns None if the step is not final.
class smolagents.ToolCallingAgent
< source >( tools: typing.List[smolagents.tools.Tool] model: typing.Callable system_prompt: typing.Optional[str] = None planning_interval: typing.Optional[int] = None **kwargs )
This agent uses JSON-like tool calls, using method model.get_tool_call
to leverage the LLM engine’s tool calling capabilities.
Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. Returns None if the step is not final.
ManagedAgent
class smolagents.ManagedAgent
< source >( agent name description additional_prompting: typing.Optional[str] = None provide_run_summary: bool = False managed_agent_prompt: typing.Optional[str] = None )
Adds additional prompting for the managed agent, like ‘add more detail in your answer’.
stream_to_gradio
smolagents.stream_to_gradio
< source >( agent task: str test_mode: bool = False reset_agent_memory: bool = False additional_args: typing.Optional[dict] = None )
Runs an agent with the given task and streams the messages from the agent as gradio ChatMessages.
GradioUI
class smolagents.GradioUI
< source >( agent: MultiStepAgent file_upload_folder: str | None = None )
A one-line interface to launch your agent in Gradio
upload_file
< source >( file file_uploads_log allowed_file_types = ['application/pdf', 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'text/plain'] )
Handle file uploads, default allowed types are .pdf, .docx, and .txt
Models
You’re free to create and use your own models to power your agent.
You could use any model
callable for your agent, as long as:
- It follows the messages format (
List[Dict[str, str]]
) for its inputmessages
, and it returns astr
. - It stops generating outputs before the sequences passed in the argument
stop_sequences
For defining your LLM, you can make a custom_model
method which accepts a list of messages and returns an object with a .content attribute containing the text. This callable also needs to accept a stop_sequences
argument that indicates when to stop generating.
from huggingface_hub import login, InferenceClient
login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")
model_id = "meta-llama/Llama-3.3-70B-Instruct"
client = InferenceClient(model=model_id)
def custom_model(messages, stop_sequences=["Task"]):
response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
answer = response.choices[0].message
return answer
Additionally, custom_model
can also take a grammar
argument. In the case where you specify a grammar
upon agent initialization, this argument will be passed to the calls to model, with the grammar
that you defined upon initialization, to allow constrained generation in order to force properly-formatted agent outputs.
TransformersModel
For convenience, we have added a TransformersModel
that implements the points above by building a local transformers
pipeline for the model_id given at initialization.
from smolagents import TransformersModel
model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
>>> What a
class smolagents.TransformersModel
< source >( model_id: typing.Optional[str] = None device_map: typing.Optional[str] = None torch_dtype: typing.Optional[str] = None trust_remote_code: bool = False **kwargs )
Parameters
- model_id (
str
, optional, defaults to"Qwen/Qwen2.5-Coder-32B-Instruct"
) — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. - device_map (
str
, optional) — The device_map to initialize your model with. - torch_dtype (
str
, optional) — The torch_dtype to initialize your model with. - trust_remote_code (bool) — Some models on the Hub require running remote code: for this model, you would have to set this flag to True.
- kwargs (dict, optional) —
Any additional keyword arguments that you want to use in model.generate(), for instance
max_new_tokens
ordevice
.
Raises
ValueError
ValueError
— If the model name is not provided.
A class to interact with Hugging Face’s Inference API for language model interaction.
This model allows you to communicate with Hugging Face’s models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization.
Example:
>>> engine = TransformersModel(
... model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
... device="cuda",
... max_new_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."
HfApiModel
The HfApiModel
wraps an HF Inference API client for the execution of the LLM.
from smolagents import HfApiModel
messages = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "No need to help, take it easy."},
]
model = HfApiModel()
print(model(messages))
>>> Of course! If you change your mind, feel free to reach out. Take care!
class smolagents.HfApiModel
< source >( model_id: str = 'Qwen/Qwen2.5-Coder-32B-Instruct' token: typing.Optional[str] = None timeout: typing.Optional[int] = 120 temperature: float = 0.5 **kwargs )
Parameters
- model_id (
str
, optional, defaults to"Qwen/Qwen2.5-Coder-32B-Instruct"
) — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. - token (
str
, optional) — Token used by the Hugging Face API for authentication. This token need to be authorized ‘Make calls to the serverless Inference API’. If the model is gated (like Llama-3 models), the token also needs ‘Read access to contents of all public gated repos you can access’. If not provided, the class will try to use environment variable ‘HF_TOKEN’, else use the token stored in the Hugging Face CLI configuration. - timeout (
int
, optional, defaults to 120) — Timeout for the API request, in seconds.
Raises
ValueError
ValueError
— If the model name is not provided.
A class to interact with Hugging Face’s Inference API for language model interaction.
This model allows you to communicate with Hugging Face’s models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization.
Example:
>>> engine = HfApiModel(
... model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
... token="your_hf_token_here",
... max_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."
LiteLLMModel
The LiteLLMModel
leverages LiteLLM to support 100+ LLMs from various providers.
You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass temperature
.
from smolagents import LiteLLMModel
messages = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "No need to help, take it easy."},
]
model = LiteLLMModel("anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))
class smolagents.LiteLLMModel
< source >( model_id = 'anthropic/claude-3-5-sonnet-20240620' api_base = None api_key = None **kwargs )
This model connects to LiteLLM as a gateway to hundreds of LLMs.
OpenAiServerModel
This class lets you call any OpenAIServer compatible model.
Here’s how you can set it (you can customise the api_base
url to point to another server):
from smolagents import OpenAIServerModel
model = OpenAIServerModel(
model_id="gpt-4o",
api_base="https://api.openai.com/v1",
api_key=os.environ["OPENAI_API_KEY"],
)