API Reference

Base URL: https://gw.kamiwazaai.org. All endpoints require an Authorization: Bearer <api-key> header.

POST /v1/route

The main routing endpoint. OpenAI-compatible — accepts the same request body as /v1/chat/completions. Routing decisions are made by Kamiwaza; the response is passed through from the upstream endpoint unchanged.

POST /v1/route

Request body

json
{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "string"}
  ],
  "stream": false,
  "max_tokens": 1024,
  "temperature": 0.7
}

"model": "auto" instructs Kamiwaza to apply your routing policy. You may also pass a specific model ID (e.g., "model": "claude-3-5-haiku") to pin a target, which Kamiwaza validates against your tenant's allowlist.

Routing headers

HeaderTypeDescription
X-TenantstringTenant ID for per-tenant policy evaluation
X-Data-ClassstringData classification label (e.g., pii-restricted, general)
X-Latency-Budget-MsintegerOverride latency budget for this request (ms)
X-Cost-Budget-UsdnumberMax cost ceiling for this request ($)

Response

OpenAI-compatible chat completion response, with an added x-kmw-routing header reporting the endpoint used and the matched rule.

http
x-kmw-routing: endpoint=private-gpu; rule=0; latency_ms=241

Endpoints API

Manage the inference endpoints registered in your Kamiwaza deployment.

GET /v1/endpoints

List all registered endpoints and their current health status.

POST /v1/endpoints

Register a new endpoint. Request body:

json
{
  "id": "private-gpu",
  "type": "vllm",
  "url": "https://gpu.internal.acme.com/v1",
  "transport": "privatelink",
  "models": ["llama-3.1-70b-instruct"],
  "health_check_interval_ms": 5000
}

Endpoint types

typeDescription
anthropicAnthropic API (claude-3-5-sonnet, haiku, opus)
openaiOpenAI API (gpt-4o, gpt-4o-mini, etc.)
bedrockAWS Bedrock managed endpoint
vllmSelf-hosted vLLM server
ollamaSelf-hosted Ollama server
azure_openaiAzure OpenAI Service deployment
customAny OpenAI-compatible endpoint via URL

Policies API

GET /v1/policies

Return the currently deployed routing policy as a JSON object.

POST /v1/policies

Deploy a new routing policy. The body must be a valid policy YAML string under the yaml key, or a JSON-equivalent policy object.

json
{
  "yaml": "version: v1\nrules:\n  - default:\n    route_to: anthropic-haiku"
}

Policy changes take effect within 5 seconds of a successful POST. No restart required.

Audit API

GET /v1/audit

Query audit records. Supports filtering by tenant, data class, endpoint, and time range.

Query parameters

ParameterTypeDescription
tenantstringFilter by tenant ID
data_classstringFilter by data class label
endpointstringFilter by endpoint ID
fromISO 8601Start of time window
toISO 8601End of time window
limitintegerMax records to return (default 100, max 1000)

Audit record schema

json
{
  "ts": "2025-05-28T14:22:01Z",
  "tenant": "acme-corp",
  "data_class": "pii-restricted",
  "rule_index": 0,
  "rule_name": "pii-to-private-gpu",
  "endpoint_id": "private-gpu",
  "model": "llama-3.1-70b-instruct",
  "latency_ms": 234,
  "tokens_in": 512,
  "tokens_out": 128,
  "status": 200
}

Tenants API

GET /v1/tenants

List all configured tenants and their current config (model allowlists, rate limit, audit bucket).

POST /v1/tenants

Create or update a tenant configuration:

json
{
  "id": "enterprise-acme",
  "model_allowlist": ["llama-3.1-70b", "claude-3-5-haiku"],
  "rate_limit_rpm": 5000,
  "audit_bucket": "s3://acme-audit-logs"
}

Authentication & common headers

HeaderRequiredDescription
AuthorizationYesBearer sk-kmw-XXXXXXXXXXXX
Content-TypeYes (POST)application/json
X-TenantNoTenant ID for policy evaluation
X-Data-ClassNoData classification label
X-Request-IdNoIdempotency key; echoed in audit log
Rate limits: The API enforces per-key rate limits. Enterprise customers can configure per-tenant rate limits via the Tenants API. Contact [email protected] for limit increases.