Engineering blog

Engineering notes from Kamiwaza

Architecture analysis on LLM routing policy design, private GPU infrastructure trade-offs, tenant isolation patterns, and cost-per-token benchmarks across Anthropic, OpenAI, and Bedrock.

latency reliability May 28, 2025 7 min read

Latency-budget routing: how to stop SLA breaches before they happen

Declare a latency budget in your routing policy. Kamiwaza evaluates real-time p95 per endpoint and fails over before your application sees a timeout.

Read article

benchmarks latency Apr 17, 2025 12 min read

Anthropic vs. OpenAI vs. Bedrock: latency and cost profiles for enterprise routing

We ran 100,000 requests across Claude Haiku, GPT-4o-mini, and Bedrock Llama 3 Instruct. The latency distributions at p50, p95, and p99 tell a different story than the marketing pages.

Read article

multi-tenant architecture Mar 5, 2025 11 min read

Tenant isolation patterns for multi-model SaaS platforms

Three patterns for giving each customer their own model policy in a shared inference gateway: per-tenant model allowlists, audit bucket isolation, and per-tenant rate limits.

Read article

security data-governance Jan 22, 2025 9 min read

Enforcing LLM safety policies at the routing layer — not the application layer

Pushing safety controls into each application creates fragmentation and audit gaps. Here's why the model gateway is the right enforcement point for data class restrictions and redaction policies.

Read article

architecture private-gpu Dec 10, 2024 10 min read

Private GPU vs. managed endpoints: a decision framework for platform teams

Not every workload belongs on-prem. Not every workload is safe on a managed API. We walk through the four questions platform teams should answer before deciding.

Read article

infrastructure cost-analysis Nov 14, 2024 8 min read

The economics of model routing: when private GPU beats managed API

A cost-per-token analysis across private GPU (Llama 3.1 70B on A100), Anthropic Claude Haiku, and AWS Bedrock Llama 3. The break-even volume is lower than most teams expect.

Read article