Routing Policies
Routing policies are YAML files that declare how Kamiwaza routes requests. Rules are evaluated top-to-bottom; the first matching rule wins.
On this page
Policy structure
Every policy file has three top-level keys: version, endpoints, and rules.
yaml
version: v1
endpoints:
- id: string # unique endpoint identifier
type: string # anthropic | openai | bedrock | vllm | ollama | azure_openai | custom
url: string # required for vllm / custom types
models: [string] # model IDs served by this endpoint
transport: string # optional: privatelink | vpc-peering (for private endpoints)
health_check_interval_ms: integer
rules:
- match: # omit for catch-all default rule
<condition>: <value>
route_to: endpoint-id
on_endpoint_unavailable: string # optional: reject | fallback | next-rule
Match conditions
Rules without a match key are catch-all defaults. Rules with match support the following conditions (all conditions in a rule must hold — AND semantics):
| Condition | Type | Description |
|---|---|---|
data_class | string | Matches the X-Data-Class header exactly |
tenant | string | Matches the X-Tenant header exactly |
tenant_in | list | Matches if tenant is any of the listed values |
data_class_in | list | Matches if data_class is any of the listed values |
endpoint_p95_ms | map | Endpoint latency condition: endpoint-id: "<N" |
model_in | list | Matches if the requested model ID is in the list |
cost_per_token_usd | string | Cost constraint: "<0.0002" (per output token) |
yamlmulti-condition example
rules:
- match:
tenant: enterprise-acme
data_class: pii-restricted
route_to: private-gpu
Route targets
route_to accepts a single endpoint ID, or a list for weighted distribution:
yaml
# Single target
route_to: anthropic-haiku
# Weighted split — useful for A/B testing or gradual rollout
route_to:
- endpoint: anthropic-haiku
weight: 70
- endpoint: openai-4o-mini
weight: 30
Fail-safe options
Control what happens when the target endpoint is unavailable:
| Value | Behavior |
|---|---|
reject | Return HTTP 503 immediately. Use for PII data that must never fall back to cloud. |
fallback | Try the next rule. If no rule matches, return 503. |
next-rule | Alias for fallback. |
yamlfail-closed for PII
rules:
- match:
data_class: pii-restricted
route_to: private-gpu
on_endpoint_unavailable: reject
Default behavior: If
on_endpoint_unavailable is omitted, Kamiwaza returns a 503 for the affected request. It does not silently fall back to an unintended endpoint.
Alerts and webhooks
The alert key fires a webhook when the associated rule is used. Useful for latency-budget monitoring:
yaml
rules:
- match:
endpoint_p95_ms:
private-gpu: "<800"
route_to: private-gpu
- default:
route_to: bedrock-burst
alert: latency-budget-exceeded
Register webhook URLs with kmw webhooks add --event latency-budget-exceeded --url https://hooks.example.com/kmw.
Example policies
Data class routing
version: v1
endpoints:
- id: private-gpu
type: vllm
url: https://gpu.internal.acme.com/v1
transport: privatelink
models: [llama-3.1-70b-instruct]
- id: anthropic-haiku
type: anthropic
models: [claude-3-5-haiku-20241022]
rules:
- match:
data_class: pii-restricted
route_to: private-gpu
on_endpoint_unavailable: reject
- default:
route_to: anthropic-haiku
Per-tenant routing
rules:
- match:
tenant: enterprise-acme
route_to: private-gpu
- match:
tenant: startup-beta
route_to: anthropic-haiku
- default:
route_to: anthropic-haiku
Latency-budget failover
rules:
- match:
endpoint_p95_ms:
private-gpu: "<800"
route_to: private-gpu
- default:
route_to: bedrock-burst
alert: latency-budget-exceeded