Documentation Index Fetch the complete documentation index at: https://docs.hyperbolic.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Vision Language Models
Sunset Notice: All vision language models on this page are being discontinued. These models will be removed in a future update. Please plan your migration accordingly.
Analyze images and documents using powerful vision-language models (VLMs). These multimodal models can understand image content, extract text, answer questions about visuals, and perform complex reasoning tasks—all through the same chat completions API.
Overview
Vision language models combine image understanding with natural language processing to enable:
Image Understanding : Describe and analyze image content
Document Analysis : Extract text from documents, receipts, and forms (OCR)
Visual Q&A : Answer questions about images
Image-based Reasoning : Perform complex analysis and comparisons
Endpoint
VLMs use the same chat completions endpoint as text models:
POST https://api.hyperbolic.xyz/v1/chat/completions
Basic Example
import base64
import requests
from PIL import Image
from io import BytesIO
def encode_image ( image_path ):
"""Encode an image file to base64 string."""
with Image.open(image_path) as img:
buffered = BytesIO()
img.save(buffered, format = "PNG" )
return base64.b64encode(buffered.getvalue()).decode( "utf-8" )
# Encode your image
base64_image = encode_image( "path/to/your/image.jpg" )
url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
"Content-Type" : "application/json" ,
"Authorization" : "Bearer YOUR_API_KEY"
}
data = {
"model" : "Qwen/Qwen2.5-VL-72B-Instruct" ,
"messages" : [
{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "What is in this image?" },
{
"type" : "image_url" ,
"image_url" : { "url" : f "data:image/png;base64, { base64_image } " }
}
]
}
],
"max_tokens" : 512 ,
"temperature" : 0.1
}
response = requests.post(url, headers = headers, json = data)
print (response.json()[ "choices" ][ 0 ][ "message" ][ "content" ])
# First, encode your image to base64:
# base64 -i image.jpg -o image_base64.txt
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-VL-72B-Instruct",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {"url": "data:image/png;base64,YOUR_BASE64_STRING"}
}
]
}
],
"max_tokens": 512,
"temperature": 0.1
}'
Encoding Images
Images must be base64-encoded before sending to the API. Here’s a helper function:
import base64
from PIL import Image
from io import BytesIO
def encode_image ( image_path ):
"""Encode an image file to base64 string."""
with Image.open(image_path) as img:
# Resize if larger than max resolution
max_size = ( 2048 , 2048 )
img.thumbnail(max_size, Image.Resampling. LANCZOS )
buffered = BytesIO()
img.save(buffered, format = "PNG" )
return base64.b64encode(buffered.getvalue()).decode( "utf-8" )
When sending images, the content field becomes an array of content objects:
{
"messages" : [
{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "Describe this image in detail." },
{
"type" : "image_url" ,
"image_url" : { "url" : "data:image/png;base64,{base64_string}" }
}
]
}
]
}
Limitations
Supported formats: JPG, PNG
Maximum resolution: 2048x2048 pixels
Images per request: 1
Multi-turn Conversations
You can ask follow-up questions about an image by maintaining conversation history:
import base64
import requests
from PIL import Image
from io import BytesIO
def encode_image ( image_path ):
with Image.open(image_path) as img:
buffered = BytesIO()
img.save(buffered, format = "PNG" )
return base64.b64encode(buffered.getvalue()).decode( "utf-8" )
base64_image = encode_image( "receipt.jpg" )
url = "https://api.hyperbolic.xyz/v1/chat/completions"
headers = {
"Content-Type" : "application/json" ,
"Authorization" : "Bearer YOUR_API_KEY"
}
# First turn: send the image
messages = [
{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "What items are on this receipt?" },
{
"type" : "image_url" ,
"image_url" : { "url" : f "data:image/png;base64, { base64_image } " }
}
]
}
]
response = requests.post(url, headers = headers, json = {
"model" : "Qwen/Qwen2.5-VL-72B-Instruct" ,
"messages" : messages,
"max_tokens" : 512
})
assistant_response = response.json()[ "choices" ][ 0 ][ "message" ][ "content" ]
print ( "First response:" , assistant_response)
# Second turn: follow-up question (no need to resend image)
messages.append({ "role" : "assistant" , "content" : assistant_response})
messages.append({ "role" : "user" , "content" : "What is the total amount?" })
response = requests.post(url, headers = headers, json = {
"model" : "Qwen/Qwen2.5-VL-72B-Instruct" ,
"messages" : messages,
"max_tokens" : 256
})
print ( "Follow-up response:" , response.json()[ "choices" ][ 0 ][ "message" ][ "content" ])
# Multi-turn conversation with follow-up question
curl -X POST "https://api.hyperbolic.xyz/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "Qwen/Qwen2.5-VL-72B-Instruct",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What items are on this receipt?"},
{
"type": "image_url",
"image_url": {"url": "data:image/png;base64,YOUR_BASE64_STRING"}
}
]
},
{
"role": "assistant",
"content": "The receipt shows: 1. Coffee - $4.50, 2. Sandwich - $8.99..."
},
{
"role": "user",
"content": "What is the total amount?"
}
],
"max_tokens": 256
}'
Available Models
Model Model ID Best For Price NVIDIA Nemotron Nano 12B v2 VL ⚠️ Sunset nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16Document intelligence $0.20/M tokens Pixtral 12B ⚠️ Sunset mistralai/Pixtral-12B-2409Budget-friendly, general use $0.10/M tokens Qwen2.5-VL-7B-Instruct ⚠️ Sunset Qwen/Qwen2.5-VL-7B-InstructBalanced cost/performance $0.20/M tokens Qwen2.5-VL-72B-Instruct ⚠️ Sunset Qwen/Qwen2.5-VL-72B-InstructBest quality, complex analysis $0.60/M tokens
Model Recommendations
Choosing the right model:
Best quality: Qwen2.5-VL-72B-Instruct for complex analysis and detailed understanding
Best value: Pixtral 12B at $0.10/M tokens for general image tasks
Document analysis: NVIDIA Nemotron Nano for OCR, forms, and document intelligence
Balanced: Qwen2.5-VL-7B-Instruct for good performance at moderate cost
Use Cases
Document Analysis
Extract text and structured data from documents, receipts, and forms:
data = {
"model" : "nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16" ,
"messages" : [{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "Extract all text from this receipt and format it as a list with item names and prices." },
{ "type" : "image_url" , "image_url" : { "url" : f "data:image/png;base64, { base64_image } " }}
]
}],
"max_tokens" : 1024
}
Image Captioning
Generate detailed descriptions of images:
data = {
"model" : "Qwen/Qwen2.5-VL-72B-Instruct" ,
"messages" : [{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "Describe this image in detail, including colors, objects, and any text visible." },
{ "type" : "image_url" , "image_url" : { "url" : f "data:image/png;base64, { base64_image } " }}
]
}],
"max_tokens" : 512
}
Visual Q&A
Ask specific questions about image content:
data = {
"model" : "mistralai/Pixtral-12B-2409" ,
"messages" : [{
"role" : "user" ,
"content" : [
{ "type" : "text" , "text" : "How many people are in this photo? What are they doing?" },
{ "type" : "image_url" , "image_url" : { "url" : f "data:image/png;base64, { base64_image } " }}
]
}],
"max_tokens" : 256
}
Next Steps
Text APIs Text generation with large language models
Image APIs Generate images from text prompts
Audio APIs Text-to-speech and audio generation