Inference Providers documentation

Hyperbolic: The On-Demand AI Cloud

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Hyperbolic: The On-Demand AI Cloud

Join 165,000+ developers building with on-demand GPUs and running inference on the latest models — at 75% less than legacy clouds.

Hyperbolic is the infrastructure powering the world’s leading AI projects. Trusted by Hugging Face, Vercel, Google, Quora, Chatbot Arena, Open Router, Black Forest Labs, Reve.art, Stanford, UC Berkeley and more.


Products and Services

GPU Marketplace

Hyperbolic provides a global network of compute to unlock on-demand GPU rentals at the lowest prices. Start in seconds, and keep running.

Bulk Rentals

Reserve dedicated GPUs with guaranteed uptime and discounted prepaid pricing — perfect for 24/7 inference, LLM tooling, training, and scaling production workloads without peak-time shortages.

Serverless Inference

Run the latest models while staying fully API-compatible with OpenAI and many other ecosystems.

Dedicated Hosting

Run LLMs, VLMs, or diffusion models on single-tenant GPUs with private endpoints. Bring your own weights or use open models. Full control, hourly pricing. Ideal for 24/7 inference or 100K+ tokens/min workloads.


Pricing

  • Rent GPUs starting at $0.16/gpu/hr
  • Access inference at 3–10x cheaper than competitors

For the latest pricing, visit our pricing page.


Resources

Supported tasks

Chat Completion (LLM)

Find out more about Chat Completion (LLM) here.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hyperbolic",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
    max_tokens=512,
)

print(completion.choices[0].message)

Chat Completion (VLM)

Find out more about Chat Completion (VLM) here.

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hyperbolic",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

completion = client.chat.completions.create(
    model="Qwen/Qwen2.5-VL-7B-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this image in one sentence."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
                    }
                }
            ]
        }
    ],
    max_tokens=512,
)

print(completion.choices[0].message)
< > Update on GitHub