Enterprise AI Orchestration

Route any prompt.
Any model.
Any team.

Kamiwaza is the model gateway for enterprise platform teams: policy-driven routing across private GPUs, Anthropic, Bedrock, and fine-tuned vLLM endpoints — by tenant, by data class, by latency budget.

Kamiwaza is not a model provider. We route your prompts to OpenAI, Anthropic, Bedrock, or your private vLLM endpoint using your existing keys — your data, your contracts.

Get Early Access Read the Docs

terminal

$ curl -X POST https://api.kamiwazaai.org/v1/route \ -H 'X-Tenant: acme-corp' \ -H 'X-Data-Class: pii-restricted' \ -d '{"model":"auto","messages":[{"role":"user","content":"Summarize Q3"}]}'

In use by platform teams at:

Vortex Analytics Cascadedata Nimblex Stelliform Labs

The problem

Your models are scattered. Your policies aren't.

Private Llama fine-tune on-prem, OpenAI for general tasks, Bedrock for regulated data — three APIs, three auth flows, three billing lines.

No tenant-level routing: all users hit the same model regardless of data sensitivity.

Latency SLAs broken when load spikes shift traffic between providers.

No audit trail: which prompt went to which model, when, under what policy.

How it works

One gateway. All your models.

Connect your endpoints

Register private GPU nodes (VPC-peered or PrivateLink), managed APIs (Anthropic, OpenAI, Bedrock, Vertex AI), and local fine-tunes in one YAML config.

Define routing policies

Write policies in YAML: route PII-flagged prompts to on-prem only, route latency-sensitive requests to managed endpoints, apply per-tenant model allowlists.

Send one request

Your app calls a single OpenAI-compatible API. Kamiwaza evaluates policy, selects the endpoint, streams the response — with full audit log.

Platform capabilities

Built for enterprise platform teams

Policy-driven routing

YAML routing rules evaluated per-request: tenant ID, data class, user role, latency budget, cost cap.

Data class enforcement

Tag prompts PII / HIPAA / public. Routing engine guarantees PII never leaves your VPC.

Tenant isolation

Each tenant gets its own model allowlist, audit bucket, and rate-limit policy — in the same gateway.

Cost and latency observability

Per-request cost attribution, p99 latency by endpoint, cost-per-token comparison across providers, and model usage breakdown by tenant.

OpenAI-compatible API

Drop-in replacement for existing OpenAI integrations. No SDK changes required — update one base URL.

Air-gap + PrivateLink ready

Deploy the gateway in your own VPC. Traffic to private GPU nodes never traverses public internet.

Routing config as code

Declare your routing policy in YAML

Version-controlled, diff-able, reviewable. No UI wizards — just config files that your infra team already knows how to manage.

# routing.yaml — Kamiwaza routing policy
version: v1
endpoints:
  - id: private-gpu
    type: vllm
    url: https://gpu.internal.acme.com/v1
    models: [llama-3.1-70b]
  - id: anthropic
    type: anthropic
    models: [claude-3-5-haiku]
  - id: bedrock
    type: bedrock
    models: [meta.llama3-instruct]
rules:
  - match:
      data_class: pii-restricted
    route_to: private-gpu
  - match:
      latency_budget_ms: <500
    route_to: anthropic
  - default:
    route_to: bedrock

# Single OpenAI-compatible request — Kamiwaza handles the routing
$ curl -X POST https://api.kamiwazaai.org/v1/route \
  -H "Authorization: Bearer $KAMIWAZA_KEY" \
  -H "X-Tenant: acme-corp" \
  -H "X-Data-Class: pii-restricted" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Summarize Q3 report"}]
  }'
# Response includes routing metadata
{
  "routed_to": "private-gpu",
  "policy_matched": "pii-restricted",
  "latency_ms": 312,
  "audit_id": "req_8xKp2mNvLq"
}

Deployment patterns

< 4ms

median routing overhead

99.95%

gateway uptime (simulated SLA)

endpoint types supported

100%

prompts with full audit trail

Designed with SOC 2 Type II controls in mind. Data never written to disk unencrypted. TLS 1.3 between all hops.

Start routing in under 10 minutes.

Get Early Access View Quickstart

Route any prompt.
Any model.
Any team.

Your models are scattered. Your policies aren't.

One gateway. All your models.

Connect your endpoints

Define routing policies

Send one request

Built for enterprise platform teams

Policy-driven routing

Data class enforcement

Tenant isolation

Cost and latency observability

OpenAI-compatible API

Air-gap + PrivateLink ready

Declare your routing policy in YAML

Common deployment patterns

Private LLM deployment

Multi-tenant SaaS

Federated inference

Start routing in under 10 minutes.

Route any prompt.Any model.Any team.

Your models are scattered. Your policies aren't.

One gateway. All your models.

Connect your endpoints

Define routing policies

Send one request

Built for enterprise platform teams

Policy-driven routing

Data class enforcement

Tenant isolation

Cost and latency observability

OpenAI-compatible API

Air-gap + PrivateLink ready

Declare your routing policy in YAML

Common deployment patterns

Private LLM deployment

Multi-tenant SaaS

Federated inference

Start routing in under 10 minutes.

Route any prompt.
Any model.
Any team.