Enterprise AI Security Checklist: 12 Controls You Cannot Skip -- KamiwazaAI

Most AI security reviews we've seen are web-app checklists with "LLM" labels pasted on top. That gap is exactly where the breaches happen. The attack surface of an enterprise AI orchestration layer is categorically different from a web app: tools execute real code, models emit text that downstream systems trust implicitly, and the blast radius of a single compromised agent can span every SaaS integration in your estate. These twelve controls are what our team treats as non-negotiable before any agent workflow touches production traffic.

Boundary Controls: Prompt Injection and Input Validation

Prompt injection is the SQL injection of the AI era—and it's significantly harder to detect because there's no grammar to parse, only semantics. Most teams get this wrong by only classifying the raw user prompt. The dangerous injections arrive in RAG-retrieved documents, because the vector store (Pinecone, Weaviate, pgvector) implicitly trusts its own content. We run a fine-tuned 7B classifier at p95 under 40ms that catches exactly those indirect injections before they enter a LangChain or LangGraph pipeline. Pair that with strict schema validation on structured inputs—JSON Schema or Pydantic—so any deviation gets a hard 400 before it reaches model context.

Control 1: Prompt Injection Defense

Run a dedicated injection classifier on all user content and retrieved context, not just the raw prompt.
Treat tool call arguments as untrusted—validate them independently of the instruction that generated them.
Log every flagged payload to an append-only store; attackers iterate fast and you need the corpus to keep up.

Control 2: Input Validation

Enforce token-count ceilings per request (8k context max for most tasks; 32k requires explicit escalation approval).
Reject non-UTF-8, unusual Unicode ranges, and zero-width characters—all common smuggling vectors.
Apply OPA policies at the API gateway to block inputs referencing internal hostname patterns or credential prefixes.

Execution Isolation: Tool Sandboxing and Model Isolation

A platform team we worked with last quarter had an AutoGen agent with a shell-exec tool and an overly broad IAM role. One malformed instruction later, it was making API calls it had no business making. The fix is structural, not prompt-based.

When any AutoGen or CrewAI agent calls a tool—filesystem read, shell exec, HTTP fetch—that tool must run in an isolated execution context. We containerize every tool handler on Kubernetes with a dedicated service account, no ambient cloud credentials, and a strict seccomp profile. Network egress from tool pods is whitelisted at the Istio policy layer; anything not on the allowlist is dropped with a metric emitted to Prometheus. Model inference endpoints get the same treatment: each model family (GPT-4o, Claude, Llama-3) lives in its own namespace with ResourceQuota and NetworkPolicy. Cross-model calls go through a brokered API, never peer-to-peer, giving you a single audit point instead of a mesh of implicit trust.

Control 3: Tool Sandboxing

No tool pod should carry IAM roles beyond least-privilege read for its specific data source.
Enforce 5-second execution timeouts; runaway tool calls are a denial-of-wallet risk at 10k QPS loads.

Control 4: Model Isolation

Separate inference namespaces prevent one model's fine-tune from poisoning another's context cache.
Audit logs from every model invocation ship to your SIEM—not just your observability stack.

Data Controls: Exfiltration Prevention and Secret Scanning of Outputs

An agent that can read your CRM and write to Slack is a data exfiltration pipe by definition. Model alignment is not a substitute for output filtering. You need two layers: pre-emission filtering and egress monitoring.

Before any model output reaches an external channel, run it through a classifier trained on PII patterns, internal code identifiers, and credential regexes. We use a regex engine plus a small BERT-based classifier in combination: the regex catches known-format secrets (AWS keys, JWTs) at sub-millisecond latency, while the classifier handles free-form sensitive content with a 0.5% false-negative target.

# Example output-filter config (simplified)
output_filter:
  rules:
    - name: aws_key
      pattern: "AKIA[0-9A-Z]{16}"
      action: redact
    - name: internal_hostname
      pattern: "\.(corp|internal)\b"
      action: block_and_alert
  classifier:
    model: bert-pii-v3
    threshold: 0.87
    fallback_on_timeout: block

Control 5: Secret Scanning of Outputs

Treat model output as untrusted code—pipe it through the same secret-scanning toolchain you use for git commits.
Vault-issued dynamic credentials have short TTLs; scan specifically for static long-lived tokens, which are the ones that actually hurt when exposed.

Control 6: Data Exfiltration Prevention

Tag all RAG-retrieved documents with a data classification label and block emission of content above the user's clearance tier.
Enforce per-session data-volume caps; a session emitting more than 50KB of retrieved content to an external endpoint is anomalous and should alert immediately.

Operational Controls: Rate Limits, Egress Monitoring, and Dependency SBOM

Rate limiting isn't cost control dressed up as security—it's your earliest signal that a credential is compromised or an agent is stuck in a loop. Set hard limits at both the API gateway (requests per minute per API key) and at the orchestration layer (LlamaIndex retrieval calls per workflow run). Export counters to Prometheus and alert at 80% of ceiling, not 100%. Waiting until you hit the wall is too late.

Every outbound HTTP call from an agent workflow should be proxied through a controlled egress gateway that logs destination, payload size, and response code to Kafka for stream processing. Anomaly detection on that Kafka topic catches novel exfiltration channels faster than any signature-based tool will. On dependencies: LangChain releases frequently and has had several supply-chain incidents. Pin versions. Maintain a Software Bill of Materials for every AI dependency and run automated SBOM diffs against a known-good baseline on every dependency update.

Controls 7–9 at a Glance

Rate Limits: Per-key, per-model, and per-workflow ceilings. Alert at 80% threshold; hard-cut at 100%.
Egress Monitoring: All agent HTTP via a logged proxy; Kafka plus stream processor for real-time anomaly detection.
Dependency SBOM: Pin LangChain, LlamaIndex, Ray, and all inference SDKs. Automated CVE correlation on every PR.

Assurance Controls: Output Filtering, Incident Response, and Red-Team Cadence

Output filtering beyond secrets means hallucination scoring and policy compliance. Every agent response that cites a fact or makes a recommendation should pass a grounding check against its source documents. We target a hallucination rate below 0.5% on factual retrieval tasks, measured weekly against a golden eval set running in Airflow. That number sounds small until you're operating at scale and "small" means thousands of wrong citations per day.

Incident response is the area where we see the most underinvestment. Traditional IR playbooks simply don't cover agent-native scenarios: how do you revoke tool access mid-session? How do you replay a conversation to reconstruct what data was accessed? How do you quarantine a compromised LlamaIndex index without taking down production retrieval? These need documented runbooks, with Temporal workflows handling the automated remediation steps—not just a Confluence page someone wrote once and never tested.

Red-team your AI attack surface on a fixed cadence. Quarterly is the minimum; bi-weekly if you're actively shipping new agent capabilities. The scope must include indirect prompt injection via data sources, tool chain abuse, and multi-turn context manipulation. Single-shot jailbreak testing alone tells you almost nothing about your actual risk posture.

Controls 10–12 at a Glance

Output Filtering: Hallucination scoring plus policy compliance check before every external emission. Target <0.5% false fact rate.
Incident Response Playbook: Agent-specific runbooks covering mid-session revocation, conversation replay, and index quarantine via Temporal automation.
Red-Team Cadence: Quarterly minimum; scope must include multi-turn and indirect injection, not just single-shot attacks.

Takeaways

Prompt injection in RAG-retrieved content is the highest-severity vector most teams under-invest in—classify retrieved chunks, not just user inputs.
Tool sandboxing on Kubernetes with Istio egress policies and least-privilege IAM is non-negotiable before any agent workflow touches production; a single over-privileged tool pod is a full-estate compromise waiting to happen.
Output secret scanning and per-session data-volume caps are your practical exfiltration prevention layer; don't rely on model alignment alone.
Red-team cadence and incident response playbooks must be AI-specific—traditional IR runbooks miss agent-native scenarios like mid-session revocation and index quarantine entirely.

← Back to Blog