Hyperbolic: The On-Demand AI Cloud

Join 165,000+ developers building with on-demand GPUs and running inference on the latest models — at 75% less than legacy clouds.

Hyperbolic is the infrastructure powering the world’s leading AI projects. Trusted by Hugging Face, Vercel, Google, Quora, Chatbot Arena, Open Router, Black Forest Labs, Reve.art, Stanford, UC Berkeley and more.

Products and Services

GPU Marketplace

Hyperbolic provides a global network of compute to unlock on-demand GPU rentals at the lowest prices. Start in seconds, and keep running.

Bulk Rentals

Reserve dedicated GPUs with guaranteed uptime and discounted prepaid pricing — perfect for 24/7 inference, LLM tooling, training, and scaling production workloads without peak-time shortages.

Serverless Inference

Run the latest models while staying fully API-compatible with OpenAI and many other ecosystems.

Dedicated Hosting

Run LLMs, VLMs, or diffusion models on single-tenant GPUs with private endpoints. Bring your own weights or use open models. Full control, hourly pricing. Ideal for 24/7 inference or 100K+ tokens/min workloads.

Pricing

Rent GPUs starting at $0.16/gpu/hr
Access inference at 3–10x cheaper than competitors

For the latest pricing, visit our pricing page.

Resources

Launch App: app.hyperbolic.xyz
Website: hyperbolic.xyz
X (Twitter): @hyperbolic_labs
LinkedIn: Hyperbolic Labs
Discord: Join our community
YouTube: @hyperbolic-labs

Supported tasks

Chat Completion (LLM)

Find out more about Chat Completion (LLM) here.

Language

Client

Provider

Settings

import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hyperbolic",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-0528",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
)

print(completion.choices[0].message)

Chat Completion (VLM)

Find out more about Chat Completion (VLM) here.