Tenant Isolation
One Kamiwaza gateway, isolated per-tenant policies. Each tenant gets their own model allowlist, rate limit, and audit bucket — invisible to other tenants.
How tenant isolation works
When a request arrives with an X-Tenant header, Kamiwaza:
- Looks up the tenant's configuration (model allowlist, rate limit, audit bucket).
- Validates the requested model against the tenant's
model_allowlist. Requests for disallowed models are rejected with HTTP 403. - Checks the tenant's rate limit. Requests over quota return HTTP 429.
- Evaluates the tenant's routing rules (inherited from the global policy, with tenant-specific overrides).
- Routes the request to the matched endpoint.
- Writes an audit record to the tenant's dedicated bucket.
Tenant configs are isolated from each other — tenant A cannot see tenant B's allowlist, rate limit, or audit records, even in a shared gateway deployment.
Configure a tenant
Define tenants in the tenants block of your policy YAML, or via the API:
version: v1
tenants:
- id: enterprise-acme
model_allowlist: [llama-3.1-70b, claude-3-5-haiku]
rate_limit_rpm: 5000
audit_bucket: s3://acme-audit-logs
- id: startup-beta
model_allowlist: [claude-3-5-haiku]
rate_limit_rpm: 500
audit_bucket: s3://startup-beta-audit
Tenants not listed in the tenants block fall through to the global policy with no model restriction and the default rate limit.
Model allowlists
The model_allowlist field accepts model IDs as they appear in your endpoint configurations. If a request specifies "model": "gpt-4o" and gpt-4o is not in the tenant's allowlist, Kamiwaza returns:
{
"error": {
"type": "model_not_allowed",
"message": "Model 'gpt-4o' is not in the allowlist for tenant 'startup-beta'",
"code": 403
}
}
An empty model_allowlist or the value "*" allows all models.
Rate limits
rate_limit_rpm sets requests-per-minute for the tenant. The counter is per-tenant, so one tenant's traffic spike doesn't degrade another tenant's latency.
| Config key | Unit | Description |
|---|---|---|
rate_limit_rpm | requests / min | Max requests per minute across all models |
rate_limit_tpm | tokens / min | Max input+output tokens per minute |
rate_limit_concurrent | count | Max concurrent in-flight requests |
rate_limit_rpm and rate_limit_tpm together for tighter cost control — RPM limits request frequency; TPM limits token spend.
Audit buckets
Each tenant can have a dedicated audit bucket. Kamiwaza writes one audit record per request in NDJSON format.
Supported bucket schemes:
s3://bucket-name— AWS S3. Kamiwaza needss3:PutObjecton the bucket.gs://bucket-name— Google Cloud Storage.az://container-name— Azure Blob Storage.
To grant Kamiwaza access to an S3 bucket, add this bucket policy:
{
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::123456789012:role/KamiwazaAuditRole"},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::acme-audit-logs/kamiwaza/*"
}]
}
Contact [email protected] for the Kamiwaza IAM role ARN for your deployment region.
X-Tenant header
Pass the X-Tenant header in every request your application sends to the Kamiwaza gateway:
import openai
client = openai.OpenAI(
api_key="sk-kmw-XXXXXXXXXXXX",
base_url="https://gw.kamiwazaai.org/v1",
default_headers={"X-Tenant": "enterprise-acme"}
)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Summarize the report"}]
)
The OpenAI Python SDK's default_headers parameter injects the tenant header on every request. All routing, rate limiting, and audit logging happen server-side — no changes to your prompt or response handling.