API Reference
Base URL: https://gw.kamiwazaai.org. All endpoints require an Authorization: Bearer <api-key> header.
POST /v1/route
The main routing endpoint. OpenAI-compatible — accepts the same request body as /v1/chat/completions. Routing decisions are made by Kamiwaza; the response is passed through from the upstream endpoint unchanged.
/v1/route
Request body
{
"model": "auto",
"messages": [
{"role": "user", "content": "string"}
],
"stream": false,
"max_tokens": 1024,
"temperature": 0.7
}
"model": "auto" instructs Kamiwaza to apply your routing policy. You may also pass a specific model ID (e.g., "model": "claude-3-5-haiku") to pin a target, which Kamiwaza validates against your tenant's allowlist.
Routing headers
| Header | Type | Description |
|---|---|---|
X-Tenant | string | Tenant ID for per-tenant policy evaluation |
X-Data-Class | string | Data classification label (e.g., pii-restricted, general) |
X-Latency-Budget-Ms | integer | Override latency budget for this request (ms) |
X-Cost-Budget-Usd | number | Max cost ceiling for this request ($) |
Response
OpenAI-compatible chat completion response, with an added x-kmw-routing header reporting the endpoint used and the matched rule.
x-kmw-routing: endpoint=private-gpu; rule=0; latency_ms=241
Endpoints API
Manage the inference endpoints registered in your Kamiwaza deployment.
/v1/endpoints
List all registered endpoints and their current health status.
/v1/endpoints
Register a new endpoint. Request body:
{
"id": "private-gpu",
"type": "vllm",
"url": "https://gpu.internal.acme.com/v1",
"transport": "privatelink",
"models": ["llama-3.1-70b-instruct"],
"health_check_interval_ms": 5000
}
Endpoint types
| type | Description |
|---|---|
anthropic | Anthropic API (claude-3-5-sonnet, haiku, opus) |
openai | OpenAI API (gpt-4o, gpt-4o-mini, etc.) |
bedrock | AWS Bedrock managed endpoint |
vllm | Self-hosted vLLM server |
ollama | Self-hosted Ollama server |
azure_openai | Azure OpenAI Service deployment |
custom | Any OpenAI-compatible endpoint via URL |
Policies API
/v1/policies
Return the currently deployed routing policy as a JSON object.
/v1/policies
Deploy a new routing policy. The body must be a valid policy YAML string under the yaml key, or a JSON-equivalent policy object.
{
"yaml": "version: v1\nrules:\n - default:\n route_to: anthropic-haiku"
}
Policy changes take effect within 5 seconds of a successful POST. No restart required.
Audit API
/v1/audit
Query audit records. Supports filtering by tenant, data class, endpoint, and time range.
Query parameters
| Parameter | Type | Description |
|---|---|---|
tenant | string | Filter by tenant ID |
data_class | string | Filter by data class label |
endpoint | string | Filter by endpoint ID |
from | ISO 8601 | Start of time window |
to | ISO 8601 | End of time window |
limit | integer | Max records to return (default 100, max 1000) |
Audit record schema
{
"ts": "2025-05-28T14:22:01Z",
"tenant": "acme-corp",
"data_class": "pii-restricted",
"rule_index": 0,
"rule_name": "pii-to-private-gpu",
"endpoint_id": "private-gpu",
"model": "llama-3.1-70b-instruct",
"latency_ms": 234,
"tokens_in": 512,
"tokens_out": 128,
"status": 200
}
Tenants API
/v1/tenants
List all configured tenants and their current config (model allowlists, rate limit, audit bucket).
/v1/tenants
Create or update a tenant configuration:
{
"id": "enterprise-acme",
"model_allowlist": ["llama-3.1-70b", "claude-3-5-haiku"],
"rate_limit_rpm": 5000,
"audit_bucket": "s3://acme-audit-logs"
}
Authentication & common headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer sk-kmw-XXXXXXXXXXXX |
Content-Type | Yes (POST) | application/json |
X-Tenant | No | Tenant ID for policy evaluation |
X-Data-Class | No | Data classification label |
X-Request-Id | No | Idempotency key; echoed in audit log |