Enterprise AI Orchestration

Route any prompt.
Any model.
Any team.

Kamiwaza is the model gateway for enterprise platform teams: policy-driven routing across private GPUs, Anthropic, Bedrock, and fine-tuned vLLM endpoints — by tenant, by data class, by latency budget.

Kamiwaza is not a model provider. We route your prompts to OpenAI, Anthropic, Bedrock, or your private vLLM endpoint using your existing keys — your data, your contracts.

Your App Prompt GATEWAY Kamiwaza Policy Engine Private GPU Llama 3.1 Anthropic Claude AWS Bedrock
terminal
$ curl -X POST https://api.kamiwazaai.org/v1/route \ -H 'X-Tenant: acme-corp' \ -H 'X-Data-Class: pii-restricted' \ -d '{"model":"auto","messages":[{"role":"user","content":"Summarize Q3"}]}'
In use by platform teams at:
Vortex Analytics Cascadedata Nimblex Stelliform Labs
The problem

Your models are scattered. Your policies aren't.

Private Llama fine-tune on-prem, OpenAI for general tasks, Bedrock for regulated data — three APIs, three auth flows, three billing lines.

No tenant-level routing: all users hit the same model regardless of data sensitivity.

Latency SLAs broken when load spikes shift traffic between providers.

No audit trail: which prompt went to which model, when, under what policy.

App Layer $ auth-key-1 OpenAI API $ auth-key-2 Bedrock $ self-hosted Private GPU
How it works

One gateway. All your models.

01

Connect your endpoints

Register private GPU nodes (VPC-peered or PrivateLink), managed APIs (Anthropic, OpenAI, Bedrock, Vertex AI), and local fine-tunes in one YAML config.

02

Define routing policies

Write policies in YAML: route PII-flagged prompts to on-prem only, route latency-sensitive requests to managed endpoints, apply per-tenant model allowlists.

03

Send one request

Your app calls a single OpenAI-compatible API. Kamiwaza evaluates policy, selects the endpoint, streams the response — with full audit log.

Platform capabilities

Built for enterprise platform teams

Policy-driven routing

YAML routing rules evaluated per-request: tenant ID, data class, user role, latency budget, cost cap.

Data class enforcement

Tag prompts PII / HIPAA / public. Routing engine guarantees PII never leaves your VPC.

Tenant isolation

Each tenant gets its own model allowlist, audit bucket, and rate-limit policy — in the same gateway.

Cost and latency observability

Per-request cost attribution, p99 latency by endpoint, cost-per-token comparison across providers, and model usage breakdown by tenant.

OpenAI-compatible API

Drop-in replacement for existing OpenAI integrations. No SDK changes required — update one base URL.

Air-gap + PrivateLink ready

Deploy the gateway in your own VPC. Traffic to private GPU nodes never traverses public internet.

Routing config as code

Declare your routing policy in YAML

Version-controlled, diff-able, reviewable. No UI wizards — just config files that your infra team already knows how to manage.

# routing.yaml — Kamiwaza routing policy
version: v1
endpoints:
  - id: private-gpu
    type: vllm
    url: https://gpu.internal.acme.com/v1
    models: [llama-3.1-70b]
  - id: anthropic
    type: anthropic
    models: [claude-3-5-haiku]
  - id: bedrock
    type: bedrock
    models: [meta.llama3-instruct]
rules:
  - match:
      data_class: pii-restricted
    route_to: private-gpu
  - match:
      latency_budget_ms: <500
    route_to: anthropic
  - default:
    route_to: bedrock
# Single OpenAI-compatible request — Kamiwaza handles the routing
$ curl -X POST https://api.kamiwazaai.org/v1/route \
  -H "Authorization: Bearer $KAMIWAZA_KEY" \
  -H "X-Tenant: acme-corp" \
  -H "X-Data-Class: pii-restricted" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Summarize Q3 report"}]
  }'
# Response includes routing metadata
{
  "routed_to": "private-gpu",
  "policy_matched": "pii-restricted",
  "latency_ms": 312,
  "audit_id": "req_8xKp2mNvLq"
}
< 4ms
median routing overhead
99.95%
gateway uptime (simulated SLA)
6+
endpoint types supported
100%
prompts with full audit trail

Designed with SOC 2 Type II controls in mind. Data never written to disk unencrypted. TLS 1.3 between all hops.

Start routing in under 10 minutes.