Route any prompt.
Any model.
Any team.
Kamiwaza is the model gateway for enterprise platform teams: policy-driven routing across private GPUs, Anthropic, Bedrock, and fine-tuned vLLM endpoints — by tenant, by data class, by latency budget.
Kamiwaza is not a model provider. We route your prompts to OpenAI, Anthropic, Bedrock, or your private vLLM endpoint using your existing keys — your data, your contracts.
Your models are scattered. Your policies aren't.
Private Llama fine-tune on-prem, OpenAI for general tasks, Bedrock for regulated data — three APIs, three auth flows, three billing lines.
No tenant-level routing: all users hit the same model regardless of data sensitivity.
Latency SLAs broken when load spikes shift traffic between providers.
No audit trail: which prompt went to which model, when, under what policy.
One gateway. All your models.
Connect your endpoints
Register private GPU nodes (VPC-peered or PrivateLink), managed APIs (Anthropic, OpenAI, Bedrock, Vertex AI), and local fine-tunes in one YAML config.
Define routing policies
Write policies in YAML: route PII-flagged prompts to on-prem only, route latency-sensitive requests to managed endpoints, apply per-tenant model allowlists.
Send one request
Your app calls a single OpenAI-compatible API. Kamiwaza evaluates policy, selects the endpoint, streams the response — with full audit log.
Built for enterprise platform teams
Policy-driven routing
YAML routing rules evaluated per-request: tenant ID, data class, user role, latency budget, cost cap.
Data class enforcement
Tag prompts PII / HIPAA / public. Routing engine guarantees PII never leaves your VPC.
Tenant isolation
Each tenant gets its own model allowlist, audit bucket, and rate-limit policy — in the same gateway.
Cost and latency observability
Per-request cost attribution, p99 latency by endpoint, cost-per-token comparison across providers, and model usage breakdown by tenant.
OpenAI-compatible API
Drop-in replacement for existing OpenAI integrations. No SDK changes required — update one base URL.
Air-gap + PrivateLink ready
Deploy the gateway in your own VPC. Traffic to private GPU nodes never traverses public internet.
Declare your routing policy in YAML
Version-controlled, diff-able, reviewable. No UI wizards — just config files that your infra team already knows how to manage.
# routing.yaml — Kamiwaza routing policy
version: v1
endpoints:
- id: private-gpu
type: vllm
url: https://gpu.internal.acme.com/v1
models: [llama-3.1-70b]
- id: anthropic
type: anthropic
models: [claude-3-5-haiku]
- id: bedrock
type: bedrock
models: [meta.llama3-instruct]
rules:
- match:
data_class: pii-restricted
route_to: private-gpu
- match:
latency_budget_ms: <500
route_to: anthropic
- default:
route_to: bedrock
# Single OpenAI-compatible request — Kamiwaza handles the routing
$ curl -X POST https://api.kamiwazaai.org/v1/route \
-H "Authorization: Bearer $KAMIWAZA_KEY" \
-H "X-Tenant: acme-corp" \
-H "X-Data-Class: pii-restricted" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role": "user", "content": "Summarize Q3 report"}]
}'
# Response includes routing metadata
{
"routed_to": "private-gpu",
"policy_matched": "pii-restricted",
"latency_ms": 312,
"audit_id": "req_8xKp2mNvLq"
}
Common deployment patterns
Private LLM deployment
Route regulated data to Llama on-prem. Keep sensitive workloads air-gapped.
Read patternMulti-tenant SaaS
Give each customer their own model policy without running a separate gateway.
Read patternFederated inference
Blend private GPU capacity with Bedrock burst — automatic failover by latency budget.
Read patternDesigned with SOC 2 Type II controls in mind. Data never written to disk unencrypted. TLS 1.3 between all hops.