April 14, 2026
Eight months running a LangGraph-based pipeline in production has hammered one thing home repeatedly: the coordination topology you pick on day one will either contain your blast radius or dramatically expand it. Most teams reach for multi-agent systems because a single LLM call can't hold enough context or enforce meaningful specialization—which is fine—but they wildly underestimate how much the coordination topology drives latency, cost, and failure modes once real traffic hits.
A 128k-context window sounds like it makes multiple agents unnecessary, but token cost is nonlinear and coherence degrades badly in long contexts. A single GPT-4o call stuffed with 80k tokens costs roughly 8–10× what three specialized 8k-context agents cost for equivalent work. P95 latency on that fat call routinely exceeds 12 seconds; a coordinated multi-agent pipeline doing the same job can come in under 2 seconds end-to-end.
Before you commit to any pattern, sketch your task graph. Which subtasks are sequential? Which can parallelize? What's the acceptable per-request error budget? That graph nearly dictates your topology. Most teams skip this step and then wonder why their architecture fights them six months later.A single orchestrator agent decomposes a task, dispatches subtasks to specialized workers, and aggregates results. LangGraph models this natively—the supervisor is a stateful graph node with edges to worker subgraphs. CrewAI's Process.hierarchical mode implements the same idea with less explicit wiring, and AutoGen's GroupChat with a GroupChatManager is a looser variant of the same concept.
The supervisor becomes a bottleneck at high throughput. At 10k QPS across a multi-tenant deployment, a single supervisor LLM call serializes dispatch and adds 300–500ms per hop. More dangerously, supervisors hallucinate task decompositions at a measurable rate—our baseline was 0.5% malformed dispatch instructions per request, which compounded across five workers yields a ~2.5% effective failure rate before any worker-level errors have even fired. The fix is straightforward but frequently skipped: enforce structured output with a JSON schema and validate with Pydantic before any worker call goes out.
In peer-to-peer topologies, agents communicate directly without a central coordinator—each agent reads outputs from others and decides whether to act. This maps well to adversarial review scenarios: one agent drafts, another critiques, a third resolves conflicts. AutoGen's bidirectional ConversableAgent pairs are the canonical implementation. The resilience story is good; there's no single point of failure. The cost story is ugly. Unconstrained P2P dialogues in production have burned 200k tokens on tasks we'd budgeted at 20k. You need hard conversation-turn limits or you will find out the hard way.
The blackboard pattern introduces a shared data structure that all agents read from and write to, decoupling them from each other entirely. Think of it as a coordination bus: agents post partial results, subscribe to relevant keys, and act when their preconditions are met. Redis with keyspace notifications is a practical blackboard for LLM agents—agent state transitions trigger pub/sub events, and each agent polls only its relevant namespace. This works well for long-running document analysis pipelines where ten specialized agents (entity extraction, sentiment, compliance flagging) work concurrently on a shared document store backed by Elasticsearch or pgvector.
When no single supervisor can hold the full task graph, you nest supervision: a top-level orchestrator delegates to mid-level team leads, each managing its own worker pool. LangGraph supports this through nested subgraph invocation. In our experience, the practical ceiling before this becomes unmanageable is roughly three levels deep. Beyond that, latency compounds multiplicatively and debugging a failed task means tracing through N×M agent interactions—which is as miserable as it sounds.
A platform team we worked with last quarter was building a hierarchical legal document review system. Their config looked roughly like this:
# LangGraph hierarchical team config (simplified)
top_orchestrator:
model: gpt-4o
max_delegation_depth: 3
timeout_ms: 8000
teams:
- name: extraction_team
lead_model: gpt-4o-mini
workers: [clause_extractor, date_parser, party_identifier]
parallelism: 3
- name: compliance_team
lead_model: gpt-4o
workers: [gdpr_checker, jurisdiction_resolver, risk_scorer]
parallelism: 2
shared_state_backend: redis://state-cluster:6379/0
trace_exporter: otlp://otel-collector:4317
With this topology, p95 latency for a 40-page contract landed at 6.2 seconds. Parallelism across the two teams cut wall-clock time by 38% versus a flat supervisor-worker setup. That 38% is real, but it came with debugging overhead that flat configurations simply don't have.
This choice shapes your observability posture as much as your performance profile, and we think most teams underweight that. Shared memory—a mutable state object passed between graph nodes in LangGraph, or a shared dict in AutoGen—is low-latency and easy to implement, but any agent can corrupt state, and replay or debugging requires snapshotting at every node. Message passing via Kafka topics per agent role, or explicit typed messages between CrewAI agents, adds 5–15ms per hop but gives you a durable audit log and natural backpressure. That tradeoff is almost always worth it once your workflow runs longer than a few seconds.
Cascading retries are the silent budget killer. When a worker fails and the supervisor retries with a fresh LLM call, token costs spike non-linearly. A 2% per-worker error rate across a five-agent supervisor-worker pipeline translates to a 40–70% cost overrun on bad days without explicit circuit breakers. Implement exponential backoff at the agent level, not just at the HTTP client level, and set hard token budgets enforced outside the LLM call itself—because the LLM will not save you here.
Track per-agent Prometheus metrics—agent_tokens_consumed_total and agent_task_duration_seconds—and alert in Grafana when any single agent exceeds its p95 budget by more than 20%. Then actually test your topology under partial failure: kill one worker class in staging and verify the supervisor degrades gracefully rather than retrying indefinitely into a timeout wall.