Designing reliable AI agents — context, state, and guardrails

Most production AI agents fail not on capability but on context handling. The model is rarely the constraint — modern frontier models are competent at reasoning, planning, and tool use. The fragility lives one layer up, in how state is preserved across tool calls, how context window is managed across multi-step tasks, and how the system recovers when a step returns something the agent didn't expect.

We learned this the hard way. Our first production agent for a logistics client could plan a six-step workflow in isolation but reliably forgot the goal halfway through when a tool returned a verbose error or an oversized result. The agent did not break. It quietly drifted into something adjacent to the original task and confidently reported success.

The state machine, not the prompt

The lesson: an agent is a state machine that happens to use a language model for some transitions. Treat it that way. Define the explicit task state, the allowed transitions, and the recovery paths up front. The model handles the soft work — reasoning about what to do next — but the system handles the hard work of remembering what was supposed to happen and refusing to drift.

Side-by-side: an agent without explicit state drifts off-goal as the prompt context degrades; an agent with explicit state holds the goal as a first-class system value and verifies after every step.

An agent without explicit state is a search algorithm pretending to be a workflow.

In practice this means two things. First, summarise aggressively. Each tool call should append a short structured summary to the agent's memory, not the raw result. The model never re-reads a thousand-token API response — it reads a sentence describing what happened. Second, plan-then-execute. Have the agent commit to a plan, write it down, and check against the plan after every step. If the plan and the action diverge, halt and surface the discrepancy.

Guardrails that fail visibly

The other principle: guardrails should fail visibly, not silently. When an agent attempts something out-of-scope, the right response is not to wave it through with a placeholder — it is to halt and ask. We instrument every agent with a small number of explicit failure modes (out-of-scope, ambiguous instruction, contradictory state) and route them to a human queue. This costs you a small amount of human time and saves you the much larger cost of an agent that confidently completes the wrong task.

None of this is exotic. It is roughly the same engineering discipline you'd apply to any distributed system with unreliable workers and partial failures. The novelty is treating the language model as one more unreliable worker rather than as the system itself.

Filed under ai agents production reliability

Keep reading

More from the studio.

All insights

Practice4 min read

Why we deliver written scopes before any code

Fixed-price contracts beat time-and-materials for both sides. Here is how we structure discovery to make them work.

Read article →

Integration6 min read

Integration is product

The unsexy middleware between your CRM, ERP, and inbox is often the single highest-leverage thing a software team can build.

Read article →

Want to talk about your project?

Tell us what you’re working on. We’ll respond within a business day.

Get in touch