Font Evaluation

The orchestration problem

When you run multiple agents in sequence — one generating a plan, another executing it, a third verifying the result — state management becomes the dominant engineering challenge. Not because it's conceptually hard, but because the failure modes are invisible until production.

Consider a three-agent pipeline:

  1. Planner generates a task decomposition
  2. Executor runs each subtask against the codebase
  3. Verifier checks the output against the original intent
async def run_pipeline(task):
    plan = await planner.generate(task)
    for i, step in enumerate(plan.steps):
        checkpoint = state.snapshot()
        result = await executor.run(step)

        if not await verifier.check(result, step):
            state.restore(checkpoint)
            result = await executor.run(step, retry=True)

The cost of a checkpoint is ~200ms. The cost of a full restart is 30–90 seconds.

MetricWithout checkpointsWith checkpointsChange
Mean completion45s38s−16%
p95 completion180s52s−71%

Checkpoints are the cheapest insurance in agent orchestration. The 200ms overhead per step is invisible. The 30–90s restart cost when you don't have them is not.

What we got wrong

  • Checkpoint size grows. After 50 steps, accumulated snapshots consumed 2GB. We added a sliding window — keep the last 5, evict older ones.
  • The verifier is the bottleneck. Verification took 40% of total pipeline time. We moved to async verification.