Font Evaluation
The orchestration problem
When you run multiple agents in sequence — one generating a plan, another executing it, a third verifying the result — state management becomes the dominant engineering challenge. Not because it's conceptually hard, but because the failure modes are invisible until production.
Consider a three-agent pipeline:
- Planner generates a task decomposition
- Executor runs each subtask against the codebase
- Verifier checks the output against the original intent
async def run_pipeline(task):
plan = await planner.generate(task)
for i, step in enumerate(plan.steps):
checkpoint = state.snapshot()
result = await executor.run(step)
if not await verifier.check(result, step):
state.restore(checkpoint)
result = await executor.run(step, retry=True)The cost of a checkpoint is ~200ms. The cost of a full restart is 30–90 seconds.
| Metric | Without checkpoints | With checkpoints | Change |
|---|---|---|---|
| Mean completion | 45s | 38s | −16% |
| p95 completion | 180s | 52s | −71% |
Checkpoints are the cheapest insurance in agent orchestration. The 200ms overhead per step is invisible. The 30–90s restart cost when you don't have them is not.
What we got wrong
- Checkpoint size grows. After 50 steps, accumulated snapshots consumed 2GB. We added a sliding window — keep the last 5, evict older ones.
- The verifier is the bottleneck. Verification took 40% of total pipeline time. We moved to async verification.