API Reference

Base URL: https://gw.kamiwazaai.org. All endpoints require an Authorization: Bearer <api-key> header.

On this page

POST /v1/route
Endpoints API
Policies API
Audit API
Tenants API
Request headers

POST /v1/route

The main routing endpoint. OpenAI-compatible — accepts the same request body as /v1/chat/completions. Routing decisions are made by Kamiwaza; the response is passed through from the upstream endpoint unchanged.

POST /v1/route

Request body

{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "string"}
  ],
  "stream": false,
  "max_tokens": 1024,
  "temperature": 0.7
}

"model": "auto" instructs Kamiwaza to apply your routing policy. You may also pass a specific model ID (e.g., "model": "claude-3-5-haiku") to pin a target, which Kamiwaza validates against your tenant's allowlist.

Routing headers

Header	Type	Description
`X-Tenant`	string	Tenant ID for per-tenant policy evaluation
`X-Data-Class`	string	Data classification label (e.g., `pii-restricted`, `general`)
`X-Latency-Budget-Ms`	integer	Override latency budget for this request (ms)
`X-Cost-Budget-Usd`	number	Max cost ceiling for this request ($)

Response

OpenAI-compatible chat completion response, with an added x-kmw-routing header reporting the endpoint used and the matched rule.

x-kmw-routing: endpoint=private-gpu; rule=0; latency_ms=241

Endpoints API

Manage the inference endpoints registered in your Kamiwaza deployment.

GET /v1/endpoints

List all registered endpoints and their current health status.

POST /v1/endpoints

{
  "id": "private-gpu",
  "type": "vllm",
  "url": "https://gpu.internal.acme.com/v1",
  "transport": "privatelink",
  "models": ["llama-3.1-70b-instruct"],
  "health_check_interval_ms": 5000
}

Endpoint types

type	Description
`anthropic`	Anthropic API (claude-3-5-sonnet, haiku, opus)
`openai`	OpenAI API (gpt-4o, gpt-4o-mini, etc.)
`bedrock`	AWS Bedrock managed endpoint
`vllm`	Self-hosted vLLM server
`ollama`	Self-hosted Ollama server
`azure_openai`	Azure OpenAI Service deployment
`custom`	Any OpenAI-compatible endpoint via URL

Policies API

GET /v1/policies

Return the currently deployed routing policy as a JSON object.

POST /v1/policies

Deploy a new routing policy. The body must be a valid policy YAML string under the yaml key, or a JSON-equivalent policy object.

{
  "yaml": "version: v1\nrules:\n  - default:\n    route_to: anthropic-haiku"
}

Policy changes take effect within 5 seconds of a successful POST. No restart required.

Audit API

GET /v1/audit

Query audit records. Supports filtering by tenant, data class, endpoint, and time range.

Query parameters

Parameter	Type	Description
`tenant`	string	Filter by tenant ID
`data_class`	string	Filter by data class label
`endpoint`	string	Filter by endpoint ID
`from`	ISO 8601	Start of time window
`to`	ISO 8601	End of time window
`limit`	integer	Max records to return (default 100, max 1000)

Audit record schema

{
  "ts": "2025-05-28T14:22:01Z",
  "tenant": "acme-corp",
  "data_class": "pii-restricted",
  "rule_index": 0,
  "rule_name": "pii-to-private-gpu",
  "endpoint_id": "private-gpu",
  "model": "llama-3.1-70b-instruct",
  "latency_ms": 234,
  "tokens_in": 512,
  "tokens_out": 128,
  "status": 200
}

Tenants API

GET /v1/tenants

List all configured tenants and their current config (model allowlists, rate limit, audit bucket).

POST /v1/tenants

Create or update a tenant configuration:

{
  "id": "enterprise-acme",
  "model_allowlist": ["llama-3.1-70b", "claude-3-5-haiku"],
  "rate_limit_rpm": 5000,
  "audit_bucket": "s3://acme-audit-logs"
}

Authentication & common headers

Header	Required	Description
`Authorization`	Yes	`Bearer sk-kmw-XXXXXXXXXXXX`
`Content-Type`	Yes (POST)	`application/json`
`X-Tenant`	No	Tenant ID for policy evaluation
`X-Data-Class`	No	Data classification label
`X-Request-Id`	No	Idempotency key; echoed in audit log

Rate limits: The API enforces per-key rate limits. Enterprise customers can configure per-tenant rate limits via the Tenants API. Contact [email protected] for limit increases.

Routing Policies