security data-governance routing

Enforcing LLM safety policies at the routing layer — not the application layer

Luke Norris January 22, 2025 9 min read

Gateway enforcement layer diagram showing data class routing decisions before requests reach model endpoints

The standard pattern for LLM safety enforcement in enterprise applications looks like this: each team building an AI-integrated feature writes their own content moderation logic, data class checks, and output filters. The reasoning is intuitive — they know their feature's requirements, so they implement the checks closest to their code. The result, at scale, is a fragmentation problem that creates exactly the audit gaps and policy inconsistencies that safety controls are meant to prevent.

This post makes the case for a different enforcement point: the model gateway. Not as a replacement for application-level safety thinking, but as the layer where policies that apply across all applications and all tenants are consistently enforced — with an audit trail that actually covers everything.

Why application-layer enforcement fragments at scale

Consider a regional health-tech platform processing 12M monthly inferences across a portfolio of 15 different AI-integrated features: clinical documentation assistance, patient FAQ chat, internal analytics summarization, billing code suggestion, and more. Each feature team built their own safety stack when they shipped. Three months later, an audit reveals:

The documentation assistant strips PHI from prompts before sending to the LLM. The patient FAQ chat does not — it relies on Anthropic's default handling.
Four features have content moderation filters. Eleven don't, or have them disabled in staging environments that occasionally receive production traffic during incident recovery.
Audit logs are per-feature, in different schemas, with different retention periods. Reconstructing a full request history for a specific patient encounter across features requires manual correlation across five separate datastores.

This isn't hypothetical negligence. It's the predictable outcome of pushing policy enforcement into application code across a team that's shipping fast. Each team implements the controls they were told to implement. The problem is that the policy contract — "PII never leaves our network unredacted," "all requests to external model APIs are logged" — was never enforced at a single point. It was enforced in N different places with N different implementations.

The OWASP LLM Top 10 and where gateway enforcement applies

The OWASP LLM Top 10 (first published 2023, updated 2025) identifies the leading vulnerability classes for LLM-integrated applications. Several of them are structural problems that gateway-layer enforcement directly addresses:

LLM01 — Prompt Injection: Attackers craft inputs that override the LLM's system instructions. Application-layer defenses vary in quality across teams. A gateway-layer defense can apply consistent prompt injection detection (request fingerprinting against known injection patterns, input length limits, structural analysis of system/user message boundaries) before the request reaches any model endpoint.

LLM06 — Sensitive Information Disclosure: The model generates output that includes sensitive data from its context. Output filters at the gateway layer can scan completions for PII patterns, credential strings, or regulated identifiers before returning the response to the calling application — acting as a last-line redaction layer regardless of which application sent the request.

LLM09 — Overreliance: Applications fail to validate model outputs before acting on them. This is partially an application concern, but structured output enforcement at the gateway (requiring JSON-mode responses and validating schema compliance before returning) removes a class of downstream failures where applications assume correct output format and break on malformed completions.

The NIST AI RMF (AI 100-1) maps these to the "Govern," "Map," and "Measure" functions in its risk framework — specifically, the requirement that AI system behavior be auditable and that policy controls be applied consistently across deployment contexts. Enforcement concentrated at the gateway satisfies these requirements in a way that distributed application-layer enforcement cannot.

Data class routing as a safety control

The most concrete implementation of gateway-level safety policy is data class-based routing. The principle: tag requests at the point of creation with a data classification, and enforce routing constraints at the gateway that guarantee the tagged data class can only reach endpoints authorized for it.

data_class_policy:
  pii-restricted:
    allowed_endpoints: [private-gpu-primary, private-gpu-failover]
    denied_endpoints: [anthropic-api, openai-api, bedrock-*]
    require_audit_log: true
    output_filter: pii-redaction-v2
  hipaa:
    allowed_endpoints: [private-gpu-hipaa-zone]
    denied_endpoints: ["*"]  # Deny all except explicit allowlist
    require_audit_log: true
    output_filter: phi-scrubber
    require_data_processor_agreement: true
  internal:
    allowed_endpoints: [private-gpu-primary, anthropic-api-dpa, bedrock-us-east-1]
    require_audit_log: true
  public:
    allowed_endpoints: ["*"]
    require_audit_log: false

When a request arrives at the gateway with X-Data-Class: pii-restricted, the policy engine evaluates the allowlist before any model selection logic runs. A PII-flagged request can never reach an external API endpoint regardless of which application sent it, regardless of what routing priority that endpoint has in the cost or latency optimization rules. The constraint is structural, not procedural.

We're not saying that application-layer safety code is unnecessary. We're saying that application-layer code is the wrong place to enforce policies that need to hold uniformly across all applications — because that uniformity breaks down the moment one application team ships a feature without reading the policy documentation. Gateway enforcement makes the policy invariant rather than advisory.

Output filtering and the case for gateway-side redaction

Even when a request is correctly routed to an authorized endpoint, the model's response may contain sensitive information extracted from the prompt context — particularly in RAG architectures where the context window includes document chunks that may contain data the model wasn't supposed to surface verbatim.

Gateway-side output filters scan completions before they return to the calling application. Common filter types:

PII detection: Regex + ML-based classifiers for email addresses, phone numbers, SSNs, credit card numbers, and similar structured identifiers. False-positive rate matters here; overly aggressive filters degrade response quality.
Credential and secret detection: Scan for patterns matching API keys, JWT tokens, private key headers. Particularly relevant for code generation and documentation assistance workloads.
Custom entity redaction: Organization-specific sensitive terms (internal project code names, employee IDs, unreleased product names) that don't appear in generic PII classifiers.

The implementation tradeoff: gateway-side output filtering adds 20-80ms of latency to each response depending on filter complexity and response length. For interactive use cases with tight latency budgets, the filter must be efficient. For batch inference workloads, this overhead is acceptable. A routing policy should specify which filter profile applies per data class and per workload type — not apply the most expensive filter to every response.

Tenant API key scoping and the audit trail requirement

In a multi-tenant platform, safety enforcement needs to be tenant-aware. A per-tenant API key issued by the gateway carries the tenant's data class restrictions and model allowlist as part of its claims — not stored in the calling application. When the key is presented, the gateway evaluates the tenant's policy before any other routing logic.

This means a misconfigured application cannot bypass tenant-level restrictions by omitting a header or passing the wrong data class tag. The tenant policy is authoritative at the gateway level; the application is a client, not a policy authority.

The audit log output for each request should include: timestamp, tenant ID, request fingerprint hash, data class tag, endpoint selected, policy rules evaluated (and their outcomes), output filter applied, and any policy violations triggered. This log is the evidence that safety policies are actually running — not just configured. For regulated industries, this audit trail is the difference between a successful compliance audit and a remediation order.

The EU AI Act and the emerging compliance landscape

The EU AI Act (effective August 2024, phased compliance through 2026) introduces risk tiering for AI systems and imposes documentation, audit, and transparency requirements on "high-risk" AI applications — including those used in employment, education, health, and law enforcement contexts. Article 9 requires that high-risk AI providers establish risk management systems with "appropriate" safety measures throughout the lifecycle.

While the Act doesn't specify technical implementation, the requirement for comprehensive audit logs, consistent safety control application, and the ability to demonstrate that controls function as intended across all deployments maps directly to what gateway-layer enforcement provides. Application-layer enforcement in 15 different codebases is not auditable in the way Article 9 envisions. A gateway with a centralized audit log and declarative policy engine is.

Platforms targeting EU customers or processing EU resident data should treat their gateway's safety enforcement architecture as compliance infrastructure, not just an ops convenience. Kamiwaza routes with full request-level audit logging by default precisely because this requirement is structural for enterprise AI deployment, not optional.

When gateway enforcement isn't enough

Gateway-layer safety controls handle the infrastructure tier of the safety stack. They don't substitute for:

System prompt hardening: Writing system prompts that are resistant to prompt injection is an application-layer responsibility. The gateway can flag suspicious inputs, but the model's behavior under adversarial prompts depends on how the system prompt was written.
Evals harness validation: Safety properties of model outputs (factual accuracy, harmful content rates, refusal behavior) need to be measured against an evals harness before and after model version updates. The gateway doesn't run evals — it enforces policies on live traffic.
Human review for high-stakes decisions: No automated output filter substitutes for human review when the model's response will trigger a consequential action (a medical recommendation, a loan decision, a moderation action affecting many users). Gateway controls reduce the blast radius of failures; they don't eliminate the need for human oversight on high-stakes outputs.

The right framing is defense in depth: gateway-layer policies as a structural floor that holds regardless of individual application behavior, combined with application-layer hardening for the specific failure modes of each use case. Neither layer alone is sufficient.