Models
Smolagents is an experimental API which is subject to change at any time. Results returned by the agents can vary as the APIs or underlying models are prone to change.
To learn more about agents and tools make sure to read the introductory guide. This page contains the API docs for the underlying classes.
Models
You’re free to create and use your own models to power your agent.
You could use any model
callable for your agent, as long as:
- It follows the messages format (
List[Dict[str, str]]
) for its inputmessages
, and it returns astr
. - It stops generating outputs before the sequences passed in the argument
stop_sequences
For defining your LLM, you can make a custom_model
method which accepts a list of messages and returns an object with a .content attribute containing the text. This callable also needs to accept a stop_sequences
argument that indicates when to stop generating.
from huggingface_hub import login, InferenceClient
login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")
model_id = "meta-llama/Llama-3.3-70B-Instruct"
client = InferenceClient(model=model_id)
def custom_model(messages, stop_sequences=["Task"]):
response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
answer = response.choices[0].message
return answer
Additionally, custom_model
can also take a grammar
argument. In the case where you specify a grammar
upon agent initialization, this argument will be passed to the calls to model, with the grammar
that you defined upon initialization, to allow constrained generation in order to force properly-formatted agent outputs.
TransformersModel
For convenience, we have added a TransformersModel
that implements the points above by building a local transformers
pipeline for the model_id given at initialization.
from smolagents import TransformersModel
model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
>>> What a
You must have transformers
and torch
installed on your machine. Please run pip install smolagents[transformers]
if it’s not the case.
class smolagents.TransformersModel
< source >( model_id: typing.Optional[str] = None device_map: typing.Optional[str] = None torch_dtype: typing.Optional[str] = None trust_remote_code: bool = False **kwargs )
Parameters
- model_id (
str
, optional, defaults to"Qwen/Qwen2.5-Coder-32B-Instruct"
) — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. - device_map (
str
, optional) — The device_map to initialize your model with. - torch_dtype (
str
, optional) — The torch_dtype to initialize your model with. - trust_remote_code (bool, default
False
) — Some models on the Hub require running remote code: for this model, you would have to set this flag to True. - kwargs (dict, optional) —
Any additional keyword arguments that you want to use in model.generate(), for instance
max_new_tokens
ordevice
. - **kwargs —
Additional keyword arguments to pass to
model.generate()
, for instancemax_new_tokens
ordevice
.
Raises
ValueError
ValueError
— If the model name is not provided.
A class to interact with Hugging Face’s Inference API for language model interaction.
This model allows you to communicate with Hugging Face’s models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization.
You must have transformers
and torch
installed on your machine. Please run pip install smolagents[transformers]
if it’s not the case.
Example:
>>> engine = TransformersModel(
... model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
... device="cuda",
... max_new_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."
HfApiModel
The HfApiModel
wraps an HF Inference API client for the execution of the LLM.
from smolagents import HfApiModel
messages = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "No need to help, take it easy."},
]
model = HfApiModel()
print(model(messages))
>>> Of course! If you change your mind, feel free to reach out. Take care!
class smolagents.HfApiModel
< source >( model_id: str = 'Qwen/Qwen2.5-Coder-32B-Instruct' token: typing.Optional[str] = None timeout: typing.Optional[int] = 120 **kwargs )
Parameters
- model_id (
str
, optional, defaults to"Qwen/Qwen2.5-Coder-32B-Instruct"
) — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. - token (
str
, optional) — Token used by the Hugging Face API for authentication. This token need to be authorized ‘Make calls to the serverless Inference API’. If the model is gated (like Llama-3 models), the token also needs ‘Read access to contents of all public gated repos you can access’. If not provided, the class will try to use environment variable ‘HF_TOKEN’, else use the token stored in the Hugging Face CLI configuration. - timeout (
int
, optional, defaults to 120) — Timeout for the API request, in seconds. - **kwargs — Additional keyword arguments to pass to the Hugging Face API.
Raises
ValueError
ValueError
— If the model name is not provided.
A class to interact with Hugging Face’s Inference API for language model interaction.
This model allows you to communicate with Hugging Face’s models using the Inference API. It can be used in both serverless mode or with a dedicated endpoint, supporting features like stop sequences and grammar customization.
Example:
>>> engine = HfApiModel(
... model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
... token="your_hf_token_here",
... max_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."
LiteLLMModel
The LiteLLMModel
leverages LiteLLM to support 100+ LLMs from various providers.
You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass temperature
.
from smolagents import LiteLLMModel
messages = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I'm doing great. How can I help you today?"},
{"role": "user", "content": "No need to help, take it easy."},
]
model = LiteLLMModel("anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))
class smolagents.LiteLLMModel
< source >( model_id = 'anthropic/claude-3-5-sonnet-20240620' api_base = None api_key = None **kwargs )
Parameters
This model connects to LiteLLM as a gateway to hundreds of LLMs.
OpenAIServerModel
This class lets you call any OpenAIServer compatible model.
Here’s how you can set it (you can customise the api_base
url to point to another server):
from smolagents import OpenAIServerModel
model = OpenAIServerModel(
model_id="gpt-4o",
api_base="https://api.openai.com/v1",
api_key=os.environ["OPENAI_API_KEY"],
)
class smolagents.OpenAIServerModel
< source >( model_id: str api_base: typing.Optional[str] = None api_key: typing.Optional[str] = None organization: typing.Optional[str] = None project: typing.Optional[str] = None custom_role_conversions: typing.Optional[typing.Dict[str, str]] = None **kwargs )
Parameters
- model_id (
str
) — The model identifier to use on the server (e.g. “gpt-3.5-turbo”). - api_base (
str
, optional) — The base URL of the OpenAI-compatible API server. - api_key (
str
, optional) — The API key to use for authentication. - organization (
str
, optional) — The organization to use for the API request. - project (
str
, optional) — The project to use for the API request. - custom_role_conversions (
dict[str, str]
, optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”. - **kwargs — Additional keyword arguments to pass to the OpenAI API.
This model connects to an OpenAI-compatible API server.
AzureOpenAIServerModel
AzureOpenAIServerModel
allows you to connect to any Azure OpenAI deployment.
Below you can find an example of how to set it up, note that you can omit the azure_endpoint
, api_key
, and api_version
arguments, provided you’ve set the corresponding environment variables — AZURE_OPENAI_ENDPOINT
, AZURE_OPENAI_API_KEY
, and OPENAI_API_VERSION
.
Pay attention to the lack of an AZURE_
prefix for OPENAI_API_VERSION
, this is due to the way the underlying openai package is designed.
import os
from smolagents import AzureOpenAIServerModel
model = AzureOpenAIServerModel(
model_id = os.environ.get("AZURE_OPENAI_MODEL"),
azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
api_version=os.environ.get("OPENAI_API_VERSION")
)
class smolagents.AzureOpenAIServerModel
< source >( model_id: str azure_endpoint: typing.Optional[str] = None api_key: typing.Optional[str] = None api_version: typing.Optional[str] = None custom_role_conversions: typing.Optional[typing.Dict[str, str]] = None **kwargs )
Parameters
- model_id (
str
) — The model deployment name to use when connecting (e.g. “gpt-4o-mini”). - azure_endpoint (
str
, optional) — The Azure endpoint, including the resource, e.g.https://example-resource.azure.openai.com/
. If not provided, it will be inferred from theAZURE_OPENAI_ENDPOINT
environment variable. - api_key (
str
, optional) — The API key to use for authentication. If not provided, it will be inferred from theAZURE_OPENAI_API_KEY
environment variable. - api_version (
str
, optional) — The API version to use. If not provided, it will be inferred from theOPENAI_API_VERSION
environment variable. - custom_role_conversions (
dict[str, str]
, optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”. - **kwargs — Additional keyword arguments to pass to the Azure OpenAI API.
This model connects to an Azure OpenAI deployment.