Inference Providers documentation
HF Inference
HF Inference
HF Inference is the serverless Inference API powered by Hugging Face. This service used to be called “Inference API (serverless)” prior to Inference Providers. If you are interested in deploying models to a dedicated and autoscaling infrastructure managed by Hugging Face, check out Inference Endpoints instead.
Supported tasks
Automatic Speech Recognition
Find out more about Automatic Speech Recognition here.
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
output = client.automatic_speech_recognition("sample1.flac", model="openai/whisper-large-v3-turbo")
Chat Completion (LLM)
Find out more about Chat Completion (LLM) here.
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
completion = client.chat.completions.create(
model="Qwen/Qwen3-235B-A22B",
messages=[
{
"role": "user",
"content": "What is the capital of France?"
}
],
max_tokens=512,
)
print(completion.choices[0].message)
Chat Completion (VLM)
Find out more about Chat Completion (VLM) here.
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
completion = client.chat.completions.create(
model="meta-llama/Llama-3.2-11B-Vision-Instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
],
max_tokens=512,
)
print(completion.choices[0].message)
Feature Extraction
Find out more about Feature Extraction here.
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
result = client.feature_extraction(
inputs="Today is a sunny day and I will get some ice cream.",
model="kyutai/mimi",
)
Text Classification
Find out more about Text Classification here.
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
result = client.text_classification(
inputs="I like you. I love you",
model="NousResearch/Minos-v1",
)
Text Generation
Find out more about Text Generation here.
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
completion = client.chat.completions.create(
model="Qwen/Qwen3-235B-A22B",
messages="\"Can you please let us know more details about your \"",
max_tokens=512,
)
print(completion.choices[0].message)
Text To Image
Find out more about Text To Image here.
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
# output is a PIL.Image object
image = client.text_to_image(
"Astronaut riding a horse",
model="black-forest-labs/FLUX.1-dev",
)