⚡️Unified Inference Across Multiple Inference Providers

The InferenceClient now supports third-party providers, offering a unified interface to run inference across multiple services while leveraging models from the Hugging Face Hub. This update enables developers to:

🌐 Switch providers seamlessly - Transition between inference providers with a single interface.
🔗 Unified model IDs - Always reference Hugging Face Hub model IDs, even when using external providers.
🔑 Simplified billing and access management - You can use your Hugging Face Token for routing to third-party providers (billed through your HF account).

A list of supported third-party providers can be found here.

Example of text-to-image inference with Replicate:

>>> from huggingface_hub import InferenceClient

>>> replicate_client = InferenceClient(
...    provider="replicate",
...    api_key="my_replicate_api_key", # Using your personal Replicate key
)
>>> image = replicate_client.text_to_image(
...    "A cyberpunk cat hacking neural networks",
...    model="black-forest-labs/FLUX.1-schnell"
)
>>> image.save("cybercat.png")

Another example of chat completion with Together AI:

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="together",  # Use Together AI provider
...     api_key="<together_api_key>",  # Pass your Together API key directly
... )
>>> client.chat_completion(
...     model="deepseek-ai/DeepSeek-R1",
...     messages=[{"role": "user", "content": "How many r's are there in strawberry?"}],
... )

When using external providers, you can choose between two access modes: either use the provider's native API key, as shown in the examples above, or route calls through Hugging Face infrastructure (billed to your HF account):

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...    provider="fal-ai",
...    token="hf_****"  # Your Hugging Face token
)

⚠️ Parameters availability may vary between providers - check provider documentation.
🔜 New providers/models/tasks will be added iteratively in the future.
👉 You can find a list of supported tasks per provider and more details here.

[InferenceClient] Add third-party providers support by @celinah in #2757

Unified prepare_request method + class-based providers by @Wauplin in #2777

[InferenceClient] Support proxy calls for 3rd party providers by @celinah in #2781

[InferenceClient] Add text-to-video task and update supported tasks and models by @celinah in #2786

Add type hints for providers by @Wauplin in #2788

[InferenceClient] Update inference documentation by @celinah in #2776

Add text-to-video to supported tasks by @Wauplin in #2790

✨ HfApi

The following change aligns the client with server-side updates by adding new repositories properties: usedStorage and resourceGroup.

[HfApi] update list of repository properties following server side updates by @celinah in #2728

Extends empty commit prevention to file copy operations, preserving clean version histories when no changes are made.

[HfApi] prevent empty commits when copying files by @celinah in #2730

🌐 📚 Documentation

Thanks to @WizKnight , the hindi translation is much better!

Improved Hindi Translation in Documentation📝 by @WizKnight in #2697

💔 Breaking changes

The like endpoint has been removed to prevent misuse. You can still remove existing likes using the unlikeendpoint.

[HfApi] remove like endpoint by @celinah in #2739

🛠️ Small fixes and maintenance

😌 QoL improvements

[InferenceClient] flag chat_completion()'s logit_bias as UNUSED by @celinah in #2724
Remove unused parameters from method's docstring by @celinah in #2738
Add optional rejection_reason when rejecting a user access token by @Wauplin in #2758
Add py.typed to be compliant with PEP-561 again by @celinah in #2752

🐛 Bug and typo fixes

Fix super_squash_history revision not urlencoded by @Wauplin in #2795
Replace model repo with repo in docstrings by @albertvillanova in #2715
[BUG] Fix 404 NOT FOUND issue caused by endpoint tail slash by @Mingqi2 in #2721
Fix typing.get_type_hints call on a ModelHubMixin by @aliberts in #2729
fix typo by @qwertyforce in #2762
rejection reason docstring by @Wauplin in #2764
Add timeout to WeakFileLock by @Wauplin in #2751
FixCardData.get() to respect default values when None by @celinah in #2770
Fix RepoCard.load when passing a repo_id that is also a dir path by @Wauplin in #2771
Fix filename too long when downloading to local folder by @Wauplin in #2789

🏗️ internal

Migrate to new Ruff "2025 style guide" formatter by @celinah in #2749
remove org tokens tests by @celinah in #2759
Fix RepoCard test on Windows by @celinah in #2774
[Bot] Update inference types by @HuggingFaceInfra in #2712

Spaces:

Wauplin
/

huggingface_hub

Sleeping

[v0.28.0]: Third-party Inference Providers on the Hub & multiple quality of life improvements and bug fixes