Routing Policies

Routing policies are YAML files that declare how Kamiwaza routes requests. Rules are evaluated top-to-bottom; the first matching rule wins.

Kamiwaza routing policy evaluation diagram showing request matching top-to-bottom through rules until first match

On this page

Policy structure
Match conditions
Route targets
Fail-safe options
Alerts and webhooks
Example policies

Policy structure

Every policy file has three top-level keys: version, endpoints, and rules.

version: v1

endpoints:
  - id: string              # unique endpoint identifier
    type: string            # anthropic | openai | bedrock | vllm | ollama | azure_openai | custom
    url: string             # required for vllm / custom types
    models: [string]        # model IDs served by this endpoint
    transport: string       # optional: privatelink | vpc-peering (for private endpoints)
    health_check_interval_ms: integer

rules:
  - match:                  # omit for catch-all default rule
      <condition>: <value>
    route_to: endpoint-id
    on_endpoint_unavailable: string  # optional: reject | fallback | next-rule

Match conditions

Rules without a match key are catch-all defaults. Rules with match support the following conditions (all conditions in a rule must hold — AND semantics):

Condition	Type	Description
`data_class`	string	Matches the `X-Data-Class` header exactly
`tenant`	string	Matches the `X-Tenant` header exactly
`tenant_in`	list	Matches if tenant is any of the listed values
`data_class_in`	list	Matches if data_class is any of the listed values
`endpoint_p95_ms`	map	Endpoint latency condition: `endpoint-id: "<N"`
`model_in`	list	Matches if the requested model ID is in the list
`cost_per_token_usd`	string	Cost constraint: `"<0.0002"` (per output token)

rules:
  - match:
      tenant: enterprise-acme
      data_class: pii-restricted
    route_to: private-gpu

Route targets

route_to accepts a single endpoint ID, or a list for weighted distribution:

# Single target
route_to: anthropic-haiku

# Weighted split — useful for A/B testing or gradual rollout
route_to:
  - endpoint: anthropic-haiku
    weight: 70
  - endpoint: openai-4o-mini
    weight: 30

Fail-safe options

Control what happens when the target endpoint is unavailable:

Value	Behavior
`reject`	Return HTTP 503 immediately. Use for PII data that must never fall back to cloud.
`fallback`	Try the next rule. If no rule matches, return 503.
`next-rule`	Alias for `fallback`.

rules:
  - match:
      data_class: pii-restricted
    route_to: private-gpu
    on_endpoint_unavailable: reject

Default behavior: If on_endpoint_unavailable is omitted, Kamiwaza returns a 503 for the affected request. It does not silently fall back to an unintended endpoint.

Alerts and webhooks

The alert key fires a webhook when the associated rule is used. Useful for latency-budget monitoring:

rules:
  - match:
      endpoint_p95_ms:
        private-gpu: "<800"
    route_to: private-gpu
  - default:
    route_to: bedrock-burst
    alert: latency-budget-exceeded

Example policies

Data class routing

version: v1
endpoints:
  - id: private-gpu
    type: vllm
    url: https://gpu.internal.acme.com/v1
    transport: privatelink
    models: [llama-3.1-70b-instruct]
  - id: anthropic-haiku
    type: anthropic
    models: [claude-3-5-haiku-20241022]
rules:
  - match:
      data_class: pii-restricted
    route_to: private-gpu
    on_endpoint_unavailable: reject
  - default:
    route_to: anthropic-haiku

Per-tenant routing

rules:
  - match:
      tenant: enterprise-acme
    route_to: private-gpu
  - match:
      tenant: startup-beta
    route_to: anthropic-haiku
  - default:
    route_to: anthropic-haiku

Latency-budget failover

rules:
  - match:
      endpoint_p95_ms:
        private-gpu: "<800"
    route_to: private-gpu
  - default:
    route_to: bedrock-burst
    alert: latency-budget-exceeded

Quickstart Tenant Isolation