Routing Policies

Routing policies are YAML files that declare how Kamiwaza routes requests. Rules are evaluated top-to-bottom; the first matching rule wins.

Kamiwaza routing policy evaluation diagram showing request matching top-to-bottom through rules until first match

Policy structure

Every policy file has three top-level keys: version, endpoints, and rules.

yaml
version: v1

endpoints:
  - id: string              # unique endpoint identifier
    type: string            # anthropic | openai | bedrock | vllm | ollama | azure_openai | custom
    url: string             # required for vllm / custom types
    models: [string]        # model IDs served by this endpoint
    transport: string       # optional: privatelink | vpc-peering (for private endpoints)
    health_check_interval_ms: integer

rules:
  - match:                  # omit for catch-all default rule
      <condition>: <value>
    route_to: endpoint-id
    on_endpoint_unavailable: string  # optional: reject | fallback | next-rule

Match conditions

Rules without a match key are catch-all defaults. Rules with match support the following conditions (all conditions in a rule must hold — AND semantics):

ConditionTypeDescription
data_classstringMatches the X-Data-Class header exactly
tenantstringMatches the X-Tenant header exactly
tenant_inlistMatches if tenant is any of the listed values
data_class_inlistMatches if data_class is any of the listed values
endpoint_p95_msmapEndpoint latency condition: endpoint-id: "<N"
model_inlistMatches if the requested model ID is in the list
cost_per_token_usdstringCost constraint: "<0.0002" (per output token)
yamlmulti-condition example
rules:
  - match:
      tenant: enterprise-acme
      data_class: pii-restricted
    route_to: private-gpu

Route targets

route_to accepts a single endpoint ID, or a list for weighted distribution:

yaml
# Single target
route_to: anthropic-haiku

# Weighted split — useful for A/B testing or gradual rollout
route_to:
  - endpoint: anthropic-haiku
    weight: 70
  - endpoint: openai-4o-mini
    weight: 30

Fail-safe options

Control what happens when the target endpoint is unavailable:

ValueBehavior
rejectReturn HTTP 503 immediately. Use for PII data that must never fall back to cloud.
fallbackTry the next rule. If no rule matches, return 503.
next-ruleAlias for fallback.
yamlfail-closed for PII
rules:
  - match:
      data_class: pii-restricted
    route_to: private-gpu
    on_endpoint_unavailable: reject
Default behavior: If on_endpoint_unavailable is omitted, Kamiwaza returns a 503 for the affected request. It does not silently fall back to an unintended endpoint.

Alerts and webhooks

The alert key fires a webhook when the associated rule is used. Useful for latency-budget monitoring:

yaml
rules:
  - match:
      endpoint_p95_ms:
        private-gpu: "<800"
    route_to: private-gpu
  - default:
    route_to: bedrock-burst
    alert: latency-budget-exceeded

Register webhook URLs with kmw webhooks add --event latency-budget-exceeded --url https://hooks.example.com/kmw.

Example policies

Data class routing

version: v1
endpoints:
  - id: private-gpu
    type: vllm
    url: https://gpu.internal.acme.com/v1
    transport: privatelink
    models: [llama-3.1-70b-instruct]
  - id: anthropic-haiku
    type: anthropic
    models: [claude-3-5-haiku-20241022]
rules:
  - match:
      data_class: pii-restricted
    route_to: private-gpu
    on_endpoint_unavailable: reject
  - default:
    route_to: anthropic-haiku

Per-tenant routing

rules:
  - match:
      tenant: enterprise-acme
    route_to: private-gpu
  - match:
      tenant: startup-beta
    route_to: anthropic-haiku
  - default:
    route_to: anthropic-haiku

Latency-budget failover

rules:
  - match:
      endpoint_p95_ms:
        private-gpu: "<800"
    route_to: private-gpu
  - default:
    route_to: bedrock-burst
    alert: latency-budget-exceeded