API Documentation

Base URL: https://api.kimi.villamarket.ai

Authentication

All API requests require a Bearer token in the Authorization header.

Authorization: Bearer your-api-key

POST

/v1/chat/completions

Create a chat completion. Supports streaming via "stream": true. Kimi K2.5 is a reasoning model — responses may include internal reasoning tokens before the final answer.

Request Body

Parameter	Type	Description
model	string	Model ID. Use `"kimi-k2.5"`
messages	array	Array of message objects with `role` and `content`
max_tokens	integer	Maximum tokens to generate (default: 4096)
stream	boolean	Enable server-sent events streaming (default: false)
temperature	number	Sampling temperature, 0–2 (default: 0.6)
tools	array	List of tool/function definitions for function calling

curl Example

curl https://api.kimi.villamarket.ai/v1/chat/completions \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kimi-k2.5",
    "messages": [{"role": "user", "content": "Explain quantum computing in simple terms"}],
    "max_tokens": 1024
  }'

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.kimi.villamarket.ai/v1",
    api_key="your-api-key"
)

# Non-streaming
response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=1024
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{"role": "user", "content": "Write a haiku about AI"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
print()

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "kimi-k2.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 128,
    "total_tokens": 143
  }
}

GET

/v1/models

List available models.

curl https://api.kimi.villamarket.ai/v1/models \
  -H "Authorization: Bearer your-api-key"

Response

{
  "object": "list",
  "data": [
    {
      "id": "kimi-k2.5",
      "object": "model",
      "owned_by": "moonshot-ai"
    }
  ]
}

GET

/health

Check server health. No authentication required.

curl https://api.kimi.villamarket.ai/health

Response

"healthy"

Notes

Reasoning tokens: Kimi K2.5 is a reasoning model. Responses may include internal chain-of-thought tokens that count toward output token usage. The final answer follows the reasoning.

Rate limits: Usage is metered per API key. Default budget: $10 per key. Contact us for higher limits.

Streaming: Use "stream": true for real-time token-by-token output via Server-Sent Events (SSE).