Kimi K2.5 API
OpenAI-compatible inference API powered by Moonshot AI's Kimi K2.5 — a 1 trillion parameter Mixture-of-Experts model running on dedicated 8× NVIDIA H100 GPUs.
1.03T
Parameters
110
Tokens / sec
8× H100
GPU Cluster
$0.45
Per 1M input tokens
Quick Start
from openai import OpenAI
client = OpenAI(
base_url="https://api.kimi.villamarket.ai/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="kimi-k2.5",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=1024
)
print(response.choices[0].message.content)Endpoints
POST/v1/chat/completionsChat completions (streaming supported)
GET/v1/modelsList available models
GET/healthServer health check
Pricing
Input
$0.45 / 1M tokens
- Prompt & system messages
- No minimum commitment
- Pay per token used
Output
$2.50 / 1M tokens
- Generated responses
- Includes reasoning tokens
- Real-time spend tracking
Model Details
Architecture: Mixture-of-Experts (384 experts, 8 active per token)
Active params: ~32B per forward pass
Context window: 4,096 tokens (configurable up to 262K)
Quantization: Native INT4 (W4A16 QAT)
Features: Reasoning, tool/function calling, multi-turn chat
Compatibility: OpenAI API drop-in replacement