State-of-the-artai models

Start in minutes with low-latency, pay-as-you-go serverless inference.

Why Run Inference on Hyperbolic?

Serverless API

Run models via REST API with Python, TypeScript, and cURL support, no infrastructure setup required, and already integrated with OpenAI? Keep your code flow and swap in your base URL and API key to start calling Hyperbolic's model catalog.

Privacy-first design

Zero data retention means requests are processed in real time and don't get logged, stored, or shared. Your prompts and responses disappear when the request finishes.

Scalable inference capacity

Built for high-demand AI workloads with flexible GPU availability and the ability to handle production traffic without the "we hit quota" wall.

Affordable pricing

Pay-as-you-go pricing with clear, displayed rates and no hidden fees or long-term commitments. If you need guaranteed capacity, you can also move to dedicated hosting with hourly pricing.

Low-latency global infrastructure

Optimized for fast response times across regions, so your users aren't waiting on a faraway cluster.

On-demand model hosting

Run open-source models in seconds, no setup, no DevOps. Hyperbolic's on-demand hosting gives you high-performance GPUs and private API access, great for rapid prototyping, internal tools, and early-stage launches. You'll find popular open models across modalities, including Llama 3.1, Qwen 2.5, DeepSeek V2.5, SDXL, and Flux, with new models rolling in regularly.

Text-TextText-Text

Text-ImageText-Image

VLMsVLMs

Text-AudioText-Audio

text-to-text

Pricing

Our transparent, usage-based pricing model keeps things predictable. Rates are displayed upfront, billing is straightforward, and you can choose the setup that matches how you run inference today.

Text-to-Text

Process and generate human-like text for NLP tasks, chat, code, summarization, and more.

Modelprice

Qwen2-VL-72B-Instruct

$0.40 / M tokens

Qwen2.5-Coder-32B

$0.20 / M tokens

Llama-3.2-3B

$0.10 / M tokens

Qwen2.5-72B

$0.40 / M tokens

DeepSeek-V2.5

$2.00 / M tokens

Llama-3-70B

$0.40 / M tokens

Hermes-3-70B

$0.40 / M tokens

Llama-3.1-405B

$4.00 / M tokens

Llama-3.1-70B

$0.40 / M tokens

Llama-3.1-8B

$0.10 / M tokens

Llama 3.1 8B (BF16) - Base

$0.10 / M tokens

Start BuildingStart Building

Availability

Where Inference Happens

Explore Latest ModelsExplore Latest Models

200,000+ Engineers

leveraging Hyperbolic’s AI ecosystem

25+ Open-source Models

available via API and sandbox

3-10x Less Expensive

than competitors

Custom Model Hosting

Dedicated Model Hosting for AI Teams

For teams that need guaranteed availability, custom configurations, or always-on capacity, dedicated hosting provides single-tenant GPU instances with private endpoints. Bring your own weights, dial in serving parameters, and monitor usage in one place. If you're doing higher-throughput inference or running a production system with strict requirements, this is the "no surprises" option.

Schedule a CallSchedule a Call

Dedicated, single-tenant GPU instances with private endpoints

Supports VLMs, LLMs, image/audio/video generation, quantization, batching, and speculative decoding

Bring your own weights, tune settings, and monitor usage

Pay hourly with unlimited requests, scale up or down anytime

Priority support with direct access to the team when it matters

Pricing

Inference at a Fraction of the Cost

Access powerful inference engines without torching your budget. Hyperbolic's optimized model serving and efficient infrastructure translate into real savings, commonly three to ten times lower cost compared to traditional providers, while still delivering the performance you need.

	Basic Tier	Pro Tier	Enterprise Tier
Rate Limit	60RPM	600RPM	Unlimited
ip address limit	100/min	100/min	Unlimited
access to ai models	Full precision (BF16) SOTA open-source models	Full precision (BF16) SOTA open-source models	Full precision (BF16) SOTA open-source models + custom models
pricing model	Pay-as-you-go	Pay-as-you-go	Custom Hourly pricing billed by GPU Type
dedicated support			Available Upon Request
dedicated instances
fine tuning services
full control over data
	Get StartedGet Started	Upgrade NowUpgrade Now	Contact UsContact Us

Use Cases

Made for Making

Dedicated Hosting
Accelerating Developer Access to Open-Source AI

“

Finding a host for the particular model we've been looking to use wasn't easy — Hyperbolic was the only platform that had it ready to go. Not only has the performance been outstanding, but their pricing absolutely crushes the major competitors. On top of that, the Hyperbolic founders provide the best customer support we’ve experienced, always going above and beyond to solve our needs. Partnering with them has been a huge win for us.

Taesung Park

Co-Founder of Reve AI

What makes Hyperbolic's AI cloud different from other providers?

Hyperbolic combines transparent pricing, instant deployment, and privacy-first design with OpenAI-compatible APIs. Our customers achieve three to ten times cost savings while accessing the latest open-source models through a developer-friendly platform that deploys in under a minute.

How does the OpenAI-compatible API work?

Simply replace your base URL and API key in existing applications to access Hyperbolic's model catalog. The API maintains full compatibility with OpenAI's interface while providing access to diverse open-source models at significantly lower costs.

What models are available on the platform?

We offer popular models including Llama 3.1, Qwen 2.5, DeepSeek V2.5, SDXL, and Flux, with new models added regularly. Our catalog spans text generation, image creation, vision-language tasks, and audio synthesis capabilities.

How does zero data retention work?

All inference requests are processed in real-time without logging, storing, or sharing your data. Our stateless architecture ensures prompts and responses never persist beyond the processing pipeline, maintaining complete privacy and compliance.

What are the pricing options?

We offer pay-as-you-go pricing with no hidden fees or commitments, plus dedicated hosting with hourly rates. All pricing is transparent and displayed upfront, with significant cost savings compared to traditional cloud providers.

How quickly can I get started?

Deploy clusters in under a minute with no sales calls or complex setup procedures. Our platform provides instant access to GPUs and models through a clean dashboard and standardized API endpoints.

What support is available?

We provide comprehensive documentation, API guides, and responsive customer support. Dedicated hosting customers receive priority support with faster response times and direct access to our engineering team.

Can I bring my own models?

Yes, dedicated hosting supports custom model weights and configurations. You can fine-tune parameters, implement custom batching strategies, and monitor usage through our management interface.

State-of-the-artai models

Pricing

Dedicated Model Hosting for AI Teams

Inference at a Fraction of the Cost

Made for Making

Inference at a Fraction of the Cost