Tenant Isolation

One Kamiwaza gateway, isolated per-tenant policies. Each tenant gets their own model allowlist, rate limit, and audit bucket — invisible to other tenants.

How tenant isolation works

When a request arrives with an X-Tenant header, Kamiwaza:

  1. Looks up the tenant's configuration (model allowlist, rate limit, audit bucket).
  2. Validates the requested model against the tenant's model_allowlist. Requests for disallowed models are rejected with HTTP 403.
  3. Checks the tenant's rate limit. Requests over quota return HTTP 429.
  4. Evaluates the tenant's routing rules (inherited from the global policy, with tenant-specific overrides).
  5. Routes the request to the matched endpoint.
  6. Writes an audit record to the tenant's dedicated bucket.

Tenant configs are isolated from each other — tenant A cannot see tenant B's allowlist, rate limit, or audit records, even in a shared gateway deployment.

Configure a tenant

Define tenants in the tenants block of your policy YAML, or via the API:

yamlpolicy.yaml
version: v1

tenants:
  - id: enterprise-acme
    model_allowlist: [llama-3.1-70b, claude-3-5-haiku]
    rate_limit_rpm: 5000
    audit_bucket: s3://acme-audit-logs
  - id: startup-beta
    model_allowlist: [claude-3-5-haiku]
    rate_limit_rpm: 500
    audit_bucket: s3://startup-beta-audit

Tenants not listed in the tenants block fall through to the global policy with no model restriction and the default rate limit.

Model allowlists

The model_allowlist field accepts model IDs as they appear in your endpoint configurations. If a request specifies "model": "gpt-4o" and gpt-4o is not in the tenant's allowlist, Kamiwaza returns:

json
{
  "error": {
    "type": "model_not_allowed",
    "message": "Model 'gpt-4o' is not in the allowlist for tenant 'startup-beta'",
    "code": 403
  }
}

An empty model_allowlist or the value "*" allows all models.

Rate limits

rate_limit_rpm sets requests-per-minute for the tenant. The counter is per-tenant, so one tenant's traffic spike doesn't degrade another tenant's latency.

Config keyUnitDescription
rate_limit_rpmrequests / minMax requests per minute across all models
rate_limit_tpmtokens / minMax input+output tokens per minute
rate_limit_concurrentcountMax concurrent in-flight requests
Tip: Set rate_limit_rpm and rate_limit_tpm together for tighter cost control — RPM limits request frequency; TPM limits token spend.

Audit buckets

Each tenant can have a dedicated audit bucket. Kamiwaza writes one audit record per request in NDJSON format.

Supported bucket schemes:

  • s3://bucket-name — AWS S3. Kamiwaza needs s3:PutObject on the bucket.
  • gs://bucket-name — Google Cloud Storage.
  • az://container-name — Azure Blob Storage.

To grant Kamiwaza access to an S3 bucket, add this bucket policy:

json
{
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": "arn:aws:iam::123456789012:role/KamiwazaAuditRole"},
    "Action": "s3:PutObject",
    "Resource": "arn:aws:s3:::acme-audit-logs/kamiwaza/*"
  }]
}

Contact [email protected] for the Kamiwaza IAM role ARN for your deployment region.

X-Tenant header

Pass the X-Tenant header in every request your application sends to the Kamiwaza gateway:

python
import openai

client = openai.OpenAI(
    api_key="sk-kmw-XXXXXXXXXXXX",
    base_url="https://gw.kamiwazaai.org/v1",
    default_headers={"X-Tenant": "enterprise-acme"}
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Summarize the report"}]
)

The OpenAI Python SDK's default_headers parameter injects the tenant header on every request. All routing, rate limiting, and audit logging happen server-side — no changes to your prompt or response handling.