Inference Providers documentation
Hyperbolic: The On-Demand AI Cloud
Hyperbolic: The On-Demand AI Cloud
Join 165,000+ developers building with on-demand GPUs and running inference on the latest models — at 75% less than legacy clouds.
Hyperbolic is the infrastructure powering the world’s leading AI projects. Trusted by Hugging Face, Vercel, Google, Quora, Chatbot Arena, Open Router, Black Forest Labs, Reve.art, Stanford, UC Berkeley and more.
Products and Services
GPU Marketplace
Hyperbolic provides a global network of compute to unlock on-demand GPU rentals at the lowest prices. Start in seconds, and keep running.
Bulk Rentals
Reserve dedicated GPUs with guaranteed uptime and discounted prepaid pricing — perfect for 24/7 inference, LLM tooling, training, and scaling production workloads without peak-time shortages.
Serverless Inference
Run the latest models while staying fully API-compatible with OpenAI and many other ecosystems.
Dedicated Hosting
Run LLMs, VLMs, or diffusion models on single-tenant GPUs with private endpoints. Bring your own weights or use open models. Full control, hourly pricing. Ideal for 24/7 inference or 100K+ tokens/min workloads.
Pricing
- Rent GPUs starting at $0.16/gpu/hr
- Access inference at 3–10x cheaper than competitors
For the latest pricing, visit our pricing page.
Resources
- Launch App: app.hyperbolic.xyz
- Website: hyperbolic.xyz
- X (Twitter): @hyperbolic_labs
- LinkedIn: Hyperbolic Labs
- Discord: Join our community
- YouTube: @hyperbolic-labs
Supported tasks
Chat Completion (LLM)
Find out more about Chat Completion (LLM) here.
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hyperbolic",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=[
{
"role": "user",
"content": "What is the capital of France?"
}
],
max_tokens=512,
)
print(completion.choices[0].message)
Chat Completion (VLM)
Find out more about Chat Completion (VLM) here.
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hyperbolic",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
completion = client.chat.completions.create(
model="Qwen/Qwen2.5-VL-7B-Instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
],
max_tokens=512,
)
print(completion.choices[0].message)