smolagents documentation
模型
模型
Smolagents 是一个实验性 API,其可能会随时发生更改。由于 API 或底层模型可能会变化,智能体返回的结果可能会有所不同。
要了解有关智能体和工具的更多信息,请务必阅读入门指南。此页面包含底层类的 API 文档。
模型
您可以自由创建和使用自己的模型为智能体提供支持。
您可以使用任何 model
可调用对象作为智能体的模型,只要满足以下条件:
- 它遵循消息格式(
List[Dict[str, str]]
),将其作为输入messages
,并返回一个str
。 - 它在生成的序列到达
stop_sequences
参数中指定的内容之前停止生成输出。
要定义您的 LLM,可以创建一个 custom_model
方法,该方法接受一个 messages 列表,并返回一个包含 .content
属性的对象,其中包含生成的文本。此可调用对象还需要接受一个 stop_sequences
参数,用于指示何时停止生成。
from huggingface_hub import login, InferenceClient
login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")
model_id = "meta-llama/Llama-3.3-70B-Instruct"
client = InferenceClient(model=model_id)
def custom_model(messages, stop_sequences=["Task"]):
response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
answer = response.choices[0].message
return answer
此外,custom_model
还可以接受一个 grammar
参数。如果在智能体初始化时指定了 grammar
,则此参数将在调用模型时传递,以便进行约束生成,从而强制生成格式正确的智能体输出。
TransformersModel
为了方便起见,我们添加了一个 TransformersModel
,该模型通过为初始化时指定的 model_id
构建一个本地 transformers
pipeline 来实现上述功能。
from smolagents import TransformersModel
model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": [{"type": "text", "text": "Ok!"}]}], stop_sequences=["great"]))
>>> What a
您必须在机器上安装 transformers
和 torch
。如果尚未安装,请运行 pip install smolagents[transformers]
。
class smolagents.TransformersModel
< source >( model_id: str | None = None device_map: str | None = None torch_dtype: str | None = None trust_remote_code: bool = False model_kwargs: dict[str, typing.Any] | None = None **kwargs )
Parameters
- model_id (
str
) — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub. For example,"Qwen/Qwen2.5-Coder-32B-Instruct"
. - device_map (
str
, optional) — The device_map to initialize your model with. - torch_dtype (
str
, optional) — The torch_dtype to initialize your model with. - trust_remote_code (bool, default
False
) — Some models on the Hub require running remote code: for this model, you would have to set this flag to True. - model_kwargs (
dict[str, Any]
, optional) — Additional keyword arguments to pass toAutoModel.from_pretrained
(like revision, model_args, config, etc.). - **kwargs —
Additional keyword arguments to pass to
model.generate()
, for instancemax_new_tokens
ordevice
.
Raises
ValueError
ValueError
— If the model name is not provided.
A class that uses Hugging Face’s Transformers library for language model interaction.
This model allows you to load and use Hugging Face’s models locally using the Transformers library. It supports features like stop sequences and grammar customization.
You must have transformers
and torch
installed on your machine. Please run pip install smolagents[transformers]
if it’s not the case.
Example:
>>> engine = TransformersModel(
... model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
... device="cuda",
... max_new_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."
InferenceClientModel
InferenceClientModel
封装了 huggingface_hub 的 InferenceClient,用于执行 LLM。它支持 HF 的 Inference API 以及 Hub 上所有可用的Inference Providers。
from smolagents import InferenceClientModel
messages = [
{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]
model = InferenceClientModel()
print(model(messages))
>>> Of course! If you change your mind, feel free to reach out. Take care!
class smolagents.InferenceClientModel
< source >( model_id: str = 'Qwen/Qwen2.5-Coder-32B-Instruct' provider: str | None = None token: str | None = None timeout: int = 120 client_kwargs: dict[str, typing.Any] | None = None custom_role_conversions: dict[str, str] | None = None api_key: str | None = None bill_to: str | None = None base_url: str | None = None **kwargs )
Parameters
- model_id (
str
, optional, default"Qwen/Qwen2.5-Coder-32B-Instruct"
) — The Hugging Face model ID to be used for inference. This can be a model identifier from the Hugging Face model hub or a URL to a deployed Inference Endpoint. Currently, it defaults to"Qwen/Qwen2.5-Coder-32B-Instruct"
, but this may change in the future. - provider (
str
, optional) — Name of the provider to use for inference. A list of supported providers can be found in the Inference Providers documentation. Defaults to “auto” i.e. the first of the providers available for the model, sorted by the user’s order here. Ifbase_url
is passed, thenprovider
is not used. - token (
str
, optional) — Token used by the Hugging Face API for authentication. This token need to be authorized ‘Make calls to the serverless Inference Providers’. If the model is gated (like Llama-3 models), the token also needs ‘Read access to contents of all public gated repos you can access’. If not provided, the class will try to use environment variable ‘HF_TOKEN’, else use the token stored in the Hugging Face CLI configuration. - timeout (
int
, optional, defaults to 120) — Timeout for the API request, in seconds. - client_kwargs (
dict[str, Any]
, optional) — Additional keyword arguments to pass to the Hugging Face InferenceClient. - custom_role_conversions (
dict[str, str]
, optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”. - api_key (
str
, optional) — Token to use for authentication. This is a duplicated argument fromtoken
to make InferenceClientModel follow the same pattern asopenai.OpenAI
client. Cannot be used iftoken
is set. Defaults to None. - bill_to (
str
, optional) — The billing account to use for the requests. By default the requests are billed on the user’s account. Requests can only be billed to an organization the user is a member of, and which has subscribed to Enterprise Hub. - base_url (
str
,optional
) — Base URL to run inference. This is a duplicated argument frommodel
to make InferenceClientModel follow the same pattern asopenai.OpenAI
client. Cannot be used ifmodel
is set. Defaults to None. - **kwargs — Additional keyword arguments to pass to the Hugging Face InferenceClient.
Raises
ValueError
ValueError
— If the model name is not provided.
A class to interact with Hugging Face’s Inference Providers for language model interaction.
This model allows you to communicate with Hugging Face’s models using Inference Providers. It can be used in both serverless mode, with a dedicated endpoint, or even with a local URL, supporting features like stop sequences and grammar customization.
Providers include Cerebras, Cohere, Fal, Fireworks, HF-Inference, Hyperbolic, Nebius, Novita, Replicate, SambaNova, Together, and more.
Example:
>>> engine = InferenceClientModel(
... model_id="Qwen/Qwen2.5-Coder-32B-Instruct",
... provider="nebius",
... token="your_hf_token_here",
... max_tokens=5000,
... )
>>> messages = [{"role": "user", "content": "Explain quantum mechanics in simple terms."}]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."
Create the Hugging Face client.
LiteLLMModel
LiteLLMModel
利用 LiteLLM 支持来自不同提供商的 100+ 个 LLM。您可以在模型初始化时传递 kwargs
,这些参数将在每次使用模型时被使用,例如下面的示例中传递了 temperature
。
from smolagents import LiteLLMModel
messages = [
{"role": "user", "content": [{"type": "text", "text": "Hello, how are you?"}]}
]
model = LiteLLMModel(model_id="anthropic/claude-3-5-sonnet-latest", temperature=0.2, max_tokens=10)
print(model(messages))
class smolagents.LiteLLMModel
< source >( model_id: str | None = None api_base: str | None = None api_key: str | None = None custom_role_conversions: dict[str, str] | None = None flatten_messages_as_text: bool | None = None **kwargs )
Parameters
- model_id (
str
) — The model identifier to use on the server (e.g. “gpt-3.5-turbo”). - api_base (
str
, optional) — The base URL of the provider API to call the model. - api_key (
str
, optional) — The API key to use for authentication. - custom_role_conversions (
dict[str, str]
, optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”. - flatten_messages_as_text (
bool
, optional) — Whether to flatten messages as text. Defaults toTrue
for models that start with “ollama”, “groq”, “cerebras”. - **kwargs — Additional keyword arguments to pass to the OpenAI API.
Model to use LiteLLM Python SDK to access hundreds of LLMs.
Create the LiteLLM client.
OpenAIServerModel
此类允许您调用任何 OpenAIServer 兼容模型。
以下是设置方法(您可以自定义 api_base
URL 指向其他服务器):
import os
from smolagents import OpenAIServerModel
model = OpenAIServerModel(
model_id="gpt-4o",
api_base="https://api.openai.com/v1",
api_key=os.environ["OPENAI_API_KEY"],
)
class smolagents.OpenAIServerModel
< source >( model_id: str api_base: str | None = None api_key: str | None = None organization: str | None = None project: str | None = None client_kwargs: dict[str, typing.Any] | None = None custom_role_conversions: dict[str, str] | None = None flatten_messages_as_text: bool = False **kwargs )
Parameters
- model_id (
str
) — The model identifier to use on the server (e.g. “gpt-3.5-turbo”). - api_base (
str
, optional) — The base URL of the OpenAI-compatible API server. - api_key (
str
, optional) — The API key to use for authentication. - organization (
str
, optional) — The organization to use for the API request. - project (
str
, optional) — The project to use for the API request. - client_kwargs (
dict[str, Any]
, optional) — Additional keyword arguments to pass to the OpenAI client (like organization, project, max_retries etc.). - custom_role_conversions (
dict[str, str]
, optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”. - flatten_messages_as_text (
bool
, defaultFalse
) — Whether to flatten messages as text. - **kwargs — Additional keyword arguments to pass to the OpenAI API.
This model connects to an OpenAI-compatible API server.
AzureOpenAIServerModel
AzureOpenAIServerModel
允许您连接到任何 Azure OpenAI 部署。
下面是设置示例,请注意,如果已经设置了相应的环境变量,您可以省略 azure_endpoint
、api_key
和 api_version
参数——环境变量包括 AZURE_OPENAI_ENDPOINT
、AZURE_OPENAI_API_KEY
和 OPENAI_API_VERSION
。
请注意,OPENAI_API_VERSION
没有 AZURE_
前缀,这是由于底层 openai 包的设计所致。
import os
from smolagents import AzureOpenAIServerModel
model = AzureOpenAIServerModel(
model_id = os.environ.get("AZURE_OPENAI_MODEL"),
azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
api_version=os.environ.get("OPENAI_API_VERSION")
)
class smolagents.AzureOpenAIServerModel
< source >( model_id: str azure_endpoint: str | None = None api_key: str | None = None api_version: str | None = None client_kwargs: dict[str, typing.Any] | None = None custom_role_conversions: dict[str, str] | None = None **kwargs )
Parameters
- model_id (
str
) — The model deployment name to use when connecting (e.g. “gpt-4o-mini”). - azure_endpoint (
str
, optional) — The Azure endpoint, including the resource, e.g.https://example-resource.azure.openai.com/
. If not provided, it will be inferred from theAZURE_OPENAI_ENDPOINT
environment variable. - api_key (
str
, optional) — The API key to use for authentication. If not provided, it will be inferred from theAZURE_OPENAI_API_KEY
environment variable. - api_version (
str
, optional) — The API version to use. If not provided, it will be inferred from theOPENAI_API_VERSION
environment variable. - client_kwargs (
dict[str, Any]
, optional) — Additional keyword arguments to pass to the AzureOpenAI client (like organization, project, max_retries etc.). - custom_role_conversions (
dict[str, str]
, optional) — Custom role conversion mapping to convert message roles in others. Useful for specific models that do not support specific message roles like “system”. - **kwargs — Additional keyword arguments to pass to the Azure OpenAI API.
This model connects to an Azure OpenAI deployment.
MLXModel
from smolagents import MLXModel
model = MLXModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct")
print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"]))
>>> What a
您必须在机器上安装 mlx-lm
。如果尚未安装,请运行 pip install smolagents[mlx-lm]
。
class smolagents.MLXModel
< source >( model_id: str trust_remote_code: bool = False load_kwargs: dict[str, typing.Any] | None = None apply_chat_template_kwargs: dict[str, typing.Any] | None = None **kwargs )
Parameters
- model_id (str) — The Hugging Face model ID to be used for inference. This can be a path or model identifier from the Hugging Face model hub.
- tool_name_key (str) — The key, which can usually be found in the model’s chat template, for retrieving a tool name.
- tool_arguments_key (str) — The key, which can usually be found in the model’s chat template, for retrieving tool arguments.
- trust_remote_code (bool, default
False
) — Some models on the Hub require running remote code: for this model, you would have to set this flag to True. - load_kwargs (dict[str, Any], optional) —
Additional keyword arguments to pass to the
mlx.lm.load
method when loading the model and tokenizer. - apply_chat_template_kwargs (dict, optional) —
Additional keyword arguments to pass to the
apply_chat_template
method of the tokenizer. - kwargs (dict, optional) —
Any additional keyword arguments that you want to use in model.generate(), for instance
max_tokens
.
A class to interact with models loaded using MLX on Apple silicon.
You must have mlx-lm
installed on your machine. Please run pip install smolagents[mlx-lm]
if it’s not the case.
Example:
>>> engine = MLXModel(
... model_id="mlx-community/Qwen2.5-Coder-32B-Instruct-4bit",
... max_tokens=10000,
... )
>>> messages = [
... {
... "role": "user",
... "content": "Explain quantum mechanics in simple terms."
... }
... ]
>>> response = engine(messages, stop_sequences=["END"])
>>> print(response)
"Quantum mechanics is the branch of physics that studies..."