[v0.28.0]: Third-party Inference Providers on the Hub & multiple quality of life improvements and bug fixes

#11
by celinah HF staff - opened

โšก๏ธUnified Inference Across Multiple Inference Providers

Screenshot 2025-01-28 at 12 05 42

The InferenceClient now supports third-party providers, offering a unified interface to run inference across multiple services while leveraging models from the Hugging Face Hub. This update enables developers to:

  • ๐ŸŒ Switch providers seamlessly - Transition between inference providers with a single interface.
  • ๐Ÿ”— Unified model IDs - Always reference Hugging Face Hub model IDs, even when using external providers.
  • ๐Ÿ”‘ Simplified billing and access management - You can use your Hugging Face Token for routing to third-party providers (billed through your HF account).

A list of supported third-party providers can be found here.

Example of text-to-image inference with Replicate:

>>> from huggingface_hub import InferenceClient

>>> replicate_client = InferenceClient(
...    provider="replicate",
...    api_key="my_replicate_api_key", # Using your personal Replicate key
)
>>> image = replicate_client.text_to_image(
...    "A cyberpunk cat hacking neural networks",
...    model="black-forest-labs/FLUX.1-schnell"
)
>>> image.save("cybercat.png")

Another example of chat completion with Together AI:

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...     provider="together",  # Use Together AI provider
...     api_key="<together_api_key>",  # Pass your Together API key directly
... )
>>> client.chat_completion(
...     model="deepseek-ai/DeepSeek-R1",
...     messages=[{"role": "user", "content": "How many r's are there in strawberry?"}],
... )

When using external providers, you can choose between two access modes: either use the provider's native API key, as shown in the examples above, or route calls through Hugging Face infrastructure (billed to your HF account):

>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient(
...    provider="fal-ai",
...    token="hf_****"  # Your Hugging Face token
)

โš ๏ธ Parameters availability may vary between providers - check provider documentation.
๐Ÿ”œ New providers/models/tasks will be added iteratively in the future.
๐Ÿ‘‰ You can find a list of supported tasks per provider and more details here.

  • [InferenceClient] Add third-party providers support by @celinah in #2757
  • Unified prepare_request method + class-based providers by @Wauplin in #2777
  • [InferenceClient] Support proxy calls for 3rd party providers by @celinah in #2781
  • [InferenceClient] Add text-to-video task and update supported tasks and models by @celinah in #2786
  • Add type hints for providers by @Wauplin in #2788
  • [InferenceClient] Update inference documentation by @celinah in #2776
  • Add text-to-video to supported tasks by @Wauplin in #2790

โœจ HfApi

The following change aligns the client with server-side updates by adding new repositories properties: usedStorage and resourceGroup.

[HfApi] update list of repository properties following server side updates by @celinah in #2728

Extends empty commit prevention to file copy operations, preserving clean version histories when no changes are made.

[HfApi] prevent empty commits when copying files by @celinah in #2730

๐ŸŒ ๐Ÿ“š Documentation

Thanks to @WizKnight , the hindi translation is much better!

Improved Hindi Translation in Documentation๐Ÿ“ by @WizKnight in #2697

๐Ÿ’” Breaking changes

The like endpoint has been removed to prevent misuse. You can still remove existing likes using the unlikeendpoint.

[HfApi] remove like endpoint by @celinah in #2739

๐Ÿ› ๏ธ Small fixes and maintenance

๐Ÿ˜Œ QoL improvements

  • [InferenceClient] flag chat_completion()'s logit_bias as UNUSED by @celinah in #2724
  • Remove unused parameters from method's docstring by @celinah in #2738
  • Add optional rejection_reason when rejecting a user access token by @Wauplin in #2758
  • Add py.typed to be compliant with PEP-561 again by @celinah in #2752

๐Ÿ› Bug and typo fixes

  • Fix super_squash_history revision not urlencoded by @Wauplin in #2795
  • Replace model repo with repo in docstrings by @albertvillanova in #2715
  • [BUG] Fix 404 NOT FOUND issue caused by endpoint tail slash by @Mingqi2 in #2721
  • Fix typing.get_type_hints call on a ModelHubMixin by @aliberts in #2729
  • fix typo by @qwertyforce in #2762
  • rejection reason docstring by @Wauplin in #2764
  • Add timeout to WeakFileLock by @Wauplin in #2751
  • FixCardData.get() to respect default values when None by @celinah in #2770
  • Fix RepoCard.load when passing a repo_id that is also a dir path by @Wauplin in #2771
  • Fix filename too long when downloading to local folder by @Wauplin in #2789

๐Ÿ—๏ธ internal

  • Migrate to new Ruff "2025 style guide" formatter by @celinah in #2749
  • remove org tokens tests by @celinah in #2759
  • Fix RepoCard test on Windows by @celinah in #2774
  • [Bot] Update inference types by @HuggingFaceInfra in #2712

Sign up or log in to comment