Inference Providers documentation

HF Inference

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

HF Inference

HF Inference is the serverless Inference API powered by Hugging Face. This service used to be called “Inference API (serverless)” prior to Inference Providers. If you are interested in deploying models to a dedicated and autoscaling infrastructure managed by Hugging Face, check out Inference Endpoints instead.

Supported tasks

Automatic Speech Recognition

Find out more about Automatic Speech Recognition here.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

output = client.automatic_speech_recognition("sample1.flac", model="openai/whisper-large-v3-turbo")

Chat Completion (LLM)

Find out more about Chat Completion (LLM) here.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

completion = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
    max_tokens=512,
)

print(completion.choices[0].message)

Chat Completion (VLM)

Find out more about Chat Completion (VLM) here.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this image in one sentence."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
                    }
                }
            ]
        }
    ],
    max_tokens=512,
)

print(completion.choices[0].message)

Feature Extraction

Find out more about Feature Extraction here.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

result = client.feature_extraction(
    inputs="Today is a sunny day and I will get some ice cream.",
    model="kyutai/mimi",
)

Text Classification

Find out more about Text Classification here.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

result = client.text_classification(
    inputs="I like you. I love you",
    model="NousResearch/Minos-v1",
)

Text Generation

Find out more about Text Generation here.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

completion = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B",
    messages="\"Can you please let us know more details about your \"",
    max_tokens=512,
)

print(completion.choices[0].message)

Text To Image

Find out more about Text To Image here.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

# output is a PIL.Image object
image = client.text_to_image(
    "Astronaut riding a horse",
    model="black-forest-labs/FLUX.1-dev",
)
< > Update on GitHub