模型

Smolagents 是一个实验性 API，其可能会随时发生更改。由于 API 或底层模型可能会变化，智能体返回的结果可能会有所不同。

要了解有关智能体和工具的更多信息，请务必阅读入门指南。此页面包含底层类的 API 文档。

模型

您可以自由创建和使用自己的模型为智能体提供支持。

您可以使用任何 model 可调用对象作为智能体的模型，只要满足以下条件：

它遵循消息格式（List[Dict[str, str]]），将其作为输入 messages，并返回一个 str。
它在生成的序列到达 stop_sequences 参数中指定的内容之前停止生成输出。

要定义您的 LLM，可以创建一个 custom_model 方法，该方法接受一个 messages 列表，并返回一个包含 .content 属性的对象，其中包含生成的文本。此可调用对象还需要接受一个 stop_sequences 参数，用于指示何时停止生成。

from huggingface_hub import login, InferenceClient

login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")

model_id = "meta-llama/Llama-3.3-70B-Instruct"

client = InferenceClient(model=model_id)

def custom_model(messages, stop_sequences=["Task"]):
    response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
    answer = response.choices[0].message
    return answer

此外，custom_model 还可以接受一个 grammar 参数。如果在智能体初始化时指定了 grammar，则此参数将在调用模型时传递，以便进行约束生成，从而强制生成格式正确的智能体输出。

TransformersModel

为了方便起见，我们添加了一个 TransformersModel，该模型通过为初始化时指定的 model_id 构建一个本地 transformers pipeline 来实现上述功能。

from smolagents import TransformersModel

model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": [{"type": "text", "text": "Ok!"}]}], stop_sequences=["great"]))

>>> What a

您必须在机器上安装 transformers 和 torch。如果尚未安装，请运行 pip install 'smolagents[transformers]'。

class smolagents.TransformersModel

< source >

Parameters

model_id (str) — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. For example, "Qwen/Qwen2.5-Coder-32B-Instruct".
device_map (str, optional) — The device_map to initialize your model with.
torch_dtype (str, optional) — The torch_dtype to initialize your model with.
trust_remote_code (bool, default False) — Some models on the Hub require running remote code: for this model, you would have to set this flag to True.
model_kwargs (dict[str, Any], optional) — Additional keyword arguments to pass to AutoModel.from_pretrained (like revision, model_args, config, etc.).
max_new_tokens (int, default 4096) — Maximum number of new tokens to generate, ignoring the number of tokens in the prompt.
max_tokens (int, optional) — Alias for max_new_tokens. If provided, this value takes precedence.
**kwargs — Additional keyword arguments to forward to the underlying Transformers model generate call, such as device.

Raises

ValueError

ValueError — If the model name is not provided.

A class that uses Hugging Face’s Transformers library for language model interaction.

This model allows you to load and use Hugging Face’s models locally using the Transformers library. It supports features like stop sequences and grammar customization.

You must have transformers and torch installed on your machine. Please run pip install 'smolagents[transformers]' if it’s not the case.

Example:

>>> engine = TransformersModel(
...     model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
...     device="cuda",
...     max_new_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."

InferenceClientModel

InferenceClientModel 封装了 huggingface_hub 的 InferenceClient，用于执行 LLM。它支持 HF 的 Inference API 以及 Hub 上所有可用的Inference Providers。

from smolagents import InferenceClientModel

messages = [
  {"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]

model = InferenceClientModel()
print(model(messages))

>>> Of course! If you change your mind, feel free to reach out. Take care!

class smolagents.InferenceClientModel

< source >

Parameters

model_id (str, optional, default "Qwen/Qwen2.5-Coder-32B-Instruct") — The Hugging Face model ID to be used for inference. This can be a model identifier from the Hugging Face model hub or a URL to a deployed Inference Endpoint. Currently, it defaults to "Qwen/Qwen2.5-Coder-32B-Instruct", but this may change in the future.
provider (str, optional) — Name of the provider to use for inference. A list of supported providers can be found in the Inference Providers documentation. Defaults to “auto” i.e. the first of the providers available for the model, sorted by the user’s order here. If base_url is passed, then provider is not used.
token (str, optional) — Token used by the Hugging Face API for authentication. This token need to be authorized ‘Make calls to the serverless Inference Providers’. If the model is gated (like Llama-3 models), the token also needs ‘Read access to contents of all public gated repos you can access’. If not provided, the class will try to use environment variable ‘HF_TOKEN’, else use the token stored in the Hugging Face CLI configuration.
timeout (int, optional, defaults to 120) — Timeout for the API request, in seconds.
client_kwargs (dict[str, Any], optional) — Additional keyword arguments to pass to the Hugging Face InferenceClient.
custom_role_conversions (dict[str, str], optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”.
api_key (str, optional) — Token to use for authentication. This is a duplicated argument from token to make InferenceClientModel follow the same pattern as openai.OpenAI client. Cannot be used if token is set. Defaults to None.
bill_to (str, optional) — The billing account to use for the requests. By default the requests are billed on the user’s account. Requests can only be billed to an organization the user is a member of, and which has subscribed to Enterprise Hub.
base_url (str, optional) — Base URL to run inference. This is a duplicated argument from model to make InferenceClientModel follow the same pattern as openai.OpenAI client. Cannot be used if model is set. Defaults to None.
**kwargs — Additional keyword arguments to forward to the underlying Hugging Face InferenceClient completion call.

Raises

ValueError

ValueError — If the model name is not provided.

A class to interact with Hugging Face’s Inference Providers for language model interaction.

This model allows you to communicate with Hugging Face’s models using Inference Providers. It can be used in both serverless mode, with a dedicated endpoint, or even with a local URL, supporting features like stop sequences and grammar customization.

Providers include Cerebras, Cohere, Fal, Fireworks, HF-Inference, Hyperbolic, Nebius, Novita, Replicate, SambaNova, Together, and more.

Example:

>>> engine = InferenceClientModel(
...     model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
...     provider="nebius",
...     token="your_hf_token_here",
...     max_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."

create_client

< source >

( )

Create the Hugging Face client.

LiteLLMModel

LiteLLMModel 利用 LiteLLM 支持来自不同提供商的 100+ 个 LLM。您可以在模型初始化时传递 kwargs，这些参数将在每次使用模型时被使用，例如下面的示例中传递了 temperature。

from smolagents import LiteLLMModel

messages = [
  {"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]

model = LiteLLMModel(model_id="anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))

class smolagents.LiteLLMModel

< source >

Parameters

model_id (str) — The model identifier to use on the server (e.g. “gpt-3.5-turbo”).
api_base (str, optional) — The base URL of the provider API to call the model.
api_key (str, optional) — The API key to use for authentication.
custom_role_conversions (dict[str, str], optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”.
flatten_messages_as_text (bool, optional) — Whether to flatten messages as text. Defaults to True for models that start with “ollama”, “groq”, “cerebras”.
**kwargs — Additional keyword arguments to forward to the underlying LiteLLM completion call.

Model to use LiteLLM Python SDK to access hundreds of LLMs.

create_client

< source >

( )

Create the LiteLLM client.

OpenAIServerModel

此类允许您调用任何 OpenAIServer 兼容模型。以下是设置方法（您可以自定义 api_base URL 指向其他服务器）：

import os
from smolagents import OpenAIServerModel

model = OpenAIServerModel(
    model_id="gpt-4o",
    api_base="https://api.openai.com/v1",
    api_key=os.environ["OPENAI_API_KEY"],
)

class smolagents.OpenAIServerModel

< source >

Parameters

model_id (str) — The model identifier to use on the server (e.g. “gpt-3.5-turbo”).
api_base (str, optional) — The base URL of the OpenAI-compatible API server.
api_key (str, optional) — The API key to use for authentication.
organization (str, optional) — The organization to use for the API request.
project (str, optional) — The project to use for the API request.
client_kwargs (dict[str, Any], optional) — Additional keyword arguments to pass to the OpenAI client (like organization, project, max_retries etc.).
custom_role_conversions (dict[str, str], optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”.
flatten_messages_as_text (bool, default False) — Whether to flatten messages as text.
**kwargs — Additional keyword arguments to forward to the underlying OpenAI API completion call, for instance temperature.

This model connects to an OpenAI-compatible API server.

AzureOpenAIServerModel

AzureOpenAIServerModel 允许您连接到任何 Azure OpenAI 部署。

下面是设置示例，请注意，如果已经设置了相应的环境变量，您可以省略 azure_endpoint、api_key 和 api_version 参数——环境变量包括 AZURE_OPENAI_ENDPOINT、AZURE_OPENAI_API_KEY 和 OPENAI_API_VERSION。

请注意，OPENAI_API_VERSION 没有 AZURE_ 前缀，这是由于底层 openai 包的设计所致。

import os

from smolagents import AzureOpenAIServerModel

model = AzureOpenAIServerModel(
    model_id = os.environ.get("AZURE_OPENAI_MODEL"),
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("OPENAI_API_VERSION")    
)

class smolagents.AzureOpenAIServerModel

< source >

Parameters

model_id (str) — The model deployment name to use when connecting (e.g. “gpt-4o-mini”).
azure_endpoint (str, optional) — The Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/. If not provided, it will be inferred from the AZURE_OPENAI_ENDPOINT environment variable.
api_key (str, optional) — The API key to use for authentication. If not provided, it will be inferred from the AZURE_OPENAI_API_KEY environment variable.
api_version (str, optional) — The API version to use. If not provided, it will be inferred from the OPENAI_API_VERSION environment variable.
client_kwargs (dict[str, Any], optional) — Additional keyword arguments to pass to the AzureOpenAI client (like organization, project, max_retries etc.).
custom_role_conversions (dict[str, str], optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”.
**kwargs — Additional keyword arguments to forward to the underlying Azure OpenAI API completion call.

This model connects to an Azure OpenAI deployment.

MLXModel

from smolagents import MLXModel

model = MLXModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")

print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))

>>> What a

您必须在机器上安装 mlx-lm。如果尚未安装，请运行 pip install 'smolagents[mlx-lm]'。

class smolagents.MLXModel

< source >

( model_id: str trust_remote_code: bool = False load_kwargs: dict[str, typing.Any] | None = None apply_chat_template_kwargs: dict[str, typing.Any] | None = None **kwargs )

Parameters

model_id (str) — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.
tool_name_key (str) — The key, which can usually be found in the model’s chat template, for retrieving a tool name.
tool_arguments_key (str) — The key, which can usually be found in the model’s chat template, for retrieving tool arguments.
trust_remote_code (bool, default False) — Some models on the Hub require running remote code: for this model, you would have to set this flag to True.
load_kwargs (dict[str, Any], optional) — Additional keyword arguments to pass to the mlx.lm.load method when loading the model and tokenizer.
apply_chat_template_kwargs (dict, optional) — Additional keyword arguments to pass to the apply_chat_template method of the tokenizer.
**kwargs — Additional keyword arguments to forward to the underlying MLX model stream_generate call, for instance max_tokens.

A class to interact with models loaded using MLX on Apple silicon.

You must have mlx-lm installed on your machine. Please run pip install 'smolagents[mlx-lm]' if it’s not the case.

Example:

>>> engine = MLXModel(
...     model_id="mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
...     max_tokens=10000,
... )
>>> messages = [
...     {
...         "role": "user",
...         "content": "Explain quantum mechanics in simple terms."
...     }
... ]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."

Update on GitHub