Font Evaluation

April 7, 2026

design

The orchestration problem

When you run multiple agents in sequence — one generating a plan, another executing it, a third verifying the result — state management becomes the dominant engineering challenge. Not because it's conceptually hard, but because the failure modes are invisible until production.

Consider a three-agent pipeline:

Planner generates a task decomposition
Executor runs each subtask against the codebase
Verifier checks the output against the original intent

async def run_pipeline(task):
    plan = await planner.generate(task)
    for i, step in enumerate(plan.steps):
        checkpoint = state.snapshot()
        result = await executor.run(step)

        if not await verifier.check(result, step):
            state.restore(checkpoint)
            result = await executor.run(step, retry=True)

The cost of a checkpoint is ~200ms. The cost of a full restart is 30–90 seconds.

Metric	Without checkpoints	With checkpoints	Change
Mean completion	45s	38s	−16%
p95 completion	180s	52s	−71%

Checkpoints are the cheapest insurance in agent orchestration. The 200ms overhead per step is invisible. The 30–90s restart cost when you don't have them is not.

What we got wrong

Checkpoint size grows. After 50 steps, accumulated snapshots consumed 2GB. We added a sliding window — keep the last 5, evict older ones.
The verifier is the bottleneck. Verification took 40% of total pipeline time. We moved to async verification.