Multi-Agent Architecture Patterns for Enterprise LLM Applications

Eight months running a LangGraph-based pipeline in production has hammered one thing home repeatedly: the coordination topology you pick on day one will either contain your blast radius or dramatically expand it. Most teams reach for multi-agent systems because a single LLM call can't hold enough context or enforce meaningful specialization—which is fine—but they wildly underestimate how much the coordination topology drives latency, cost, and failure modes once real traffic hits.

Why Topology Matters Before You Write a Single Prompt

A 128k-context window sounds like it makes multiple agents unnecessary, but token cost is nonlinear and coherence degrades badly in long contexts. A single GPT-4o call stuffed with 80k tokens costs roughly 8–10× what three specialized 8k-context agents cost for equivalent work. P95 latency on that fat call routinely exceeds 12 seconds; a coordinated multi-agent pipeline doing the same job can come in under 2 seconds end-to-end.

Before you commit to any pattern, sketch your task graph. Which subtasks are sequential? Which can parallelize? What's the acceptable per-request error budget? That graph nearly dictates your topology. Most teams skip this step and then wonder why their architecture fights them six months later.

Supervisor-Worker: The Pattern You'll Reach for First

A single orchestrator agent decomposes a task, dispatches subtasks to specialized workers, and aggregates results. LangGraph models this natively—the supervisor is a stateful graph node with edges to worker subgraphs. CrewAI's Process.hierarchical mode implements the same idea with less explicit wiring, and AutoGen's GroupChat with a GroupChatManager is a looser variant of the same concept.

When It Fits

Clear task decomposition with few interdependencies between workers
Workers have heterogeneous capabilities (code execution, RAG retrieval, web search)
You need a single accountability point for retries and timeouts

When It Breaks

The supervisor becomes a bottleneck at high throughput. At 10k QPS across a multi-tenant deployment, a single supervisor LLM call serializes dispatch and adds 300–500ms per hop. More dangerously, supervisors hallucinate task decompositions at a measurable rate—our baseline was 0.5% malformed dispatch instructions per request, which compounded across five workers yields a ~2.5% effective failure rate before any worker-level errors have even fired. The fix is straightforward but frequently skipped: enforce structured output with a JSON schema and validate with Pydantic before any worker call goes out.

Peer-to-Peer and Blackboard: For Emergent Reasoning

In peer-to-peer topologies, agents communicate directly without a central coordinator—each agent reads outputs from others and decides whether to act. This maps well to adversarial review scenarios: one agent drafts, another critiques, a third resolves conflicts. AutoGen's bidirectional ConversableAgent pairs are the canonical implementation. The resilience story is good; there's no single point of failure. The cost story is ugly. Unconstrained P2P dialogues in production have burned 200k tokens on tasks we'd budgeted at 20k. You need hard conversation-turn limits or you will find out the hard way.

The blackboard pattern introduces a shared data structure that all agents read from and write to, decoupling them from each other entirely. Think of it as a coordination bus: agents post partial results, subscribe to relevant keys, and act when their preconditions are met. Redis with keyspace notifications is a practical blackboard for LLM agents—agent state transitions trigger pub/sub events, and each agent polls only its relevant namespace. This works well for long-running document analysis pipelines where ten specialized agents (entity extraction, sentiment, compliance flagging) work concurrently on a shared document store backed by Elasticsearch or pgvector.

Hierarchical Teams: Scaling Organizational Complexity

When no single supervisor can hold the full task graph, you nest supervision: a top-level orchestrator delegates to mid-level team leads, each managing its own worker pool. LangGraph supports this through nested subgraph invocation. In our experience, the practical ceiling before this becomes unmanageable is roughly three levels deep. Beyond that, latency compounds multiplicatively and debugging a failed task means tracing through N×M agent interactions—which is as miserable as it sounds.

A platform team we worked with last quarter was building a hierarchical legal document review system. Their config looked roughly like this:

# LangGraph hierarchical team config (simplified)
top_orchestrator:
  model: gpt-4o
  max_delegation_depth: 3
  timeout_ms: 8000

teams:
  - name: extraction_team
    lead_model: gpt-4o-mini
    workers: [clause_extractor, date_parser, party_identifier]
    parallelism: 3

  - name: compliance_team
    lead_model: gpt-4o
    workers: [gdpr_checker, jurisdiction_resolver, risk_scorer]
    parallelism: 2

shared_state_backend: redis://state-cluster:6379/0
trace_exporter: otlp://otel-collector:4317

With this topology, p95 latency for a 40-page contract landed at 6.2 seconds. Parallelism across the two teams cut wall-clock time by 38% versus a flat supervisor-worker setup. That 38% is real, but it came with debugging overhead that flat configurations simply don't have.

Communication Protocols: Shared Memory vs. Message Passing

This choice shapes your observability posture as much as your performance profile, and we think most teams underweight that. Shared memory—a mutable state object passed between graph nodes in LangGraph, or a shared dict in AutoGen—is low-latency and easy to implement, but any agent can corrupt state, and replay or debugging requires snapshotting at every node. Message passing via Kafka topics per agent role, or explicit typed messages between CrewAI agents, adds 5–15ms per hop but gives you a durable audit log and natural backpressure. That tradeoff is almost always worth it once your workflow runs longer than a few seconds.

Production Guidance

Use shared memory for tightly-coupled, short-lived pipelines where p95 latency budget is under 400ms
Use Kafka-backed message passing for async workflows where tasks span minutes or require exactly-once semantics—pair with Temporal for durable execution guarantees
Instrument every agent boundary with OpenTelemetry spans; without distributed traces across agent hops, MTTR on production incidents is measured in hours, not minutes
Apply OPA policies at message ingestion to enforce data residency and prevent agents from accessing out-of-scope tool calls

Failure Modes You Won't See in the Demo

Cascading retries are the silent budget killer. When a worker fails and the supervisor retries with a fresh LLM call, token costs spike non-linearly. A 2% per-worker error rate across a five-agent supervisor-worker pipeline translates to a 40–70% cost overrun on bad days without explicit circuit breakers. Implement exponential backoff at the agent level, not just at the HTTP client level, and set hard token budgets enforced outside the LLM call itself—because the LLM will not save you here.

Track per-agent Prometheus metrics—agent_tokens_consumed_total and agent_task_duration_seconds—and alert in Grafana when any single agent exceeds its p95 budget by more than 20%. Then actually test your topology under partial failure: kill one worker class in staging and verify the supervisor degrades gracefully rather than retrying indefinitely into a timeout wall.

Takeaways

Match topology to your task graph first: supervisor-worker for clear hierarchical decomposition, peer-to-peer for adversarial review, blackboard for parallel independent agents, hierarchical teams when organizational complexity demands it.
Shared memory is fast but fragile at scale; Kafka-backed message passing with Temporal gives you durability and replay at the cost of ~10ms per hop—worth it for any workflow exceeding 30 seconds.
Instrument every agent boundary with OpenTelemetry before you go to production; the topology that looks clean in a notebook becomes a debugging nightmare without distributed traces.
Cascading retry loops are your biggest cost risk—enforce token budgets and circuit breakers outside the LLM layer, and track per-agent Prometheus metrics to catch overruns before they compound.

← Back to Blog