Home / Insights / AI engineering
Designing agentic systems that don't lose the plot
Most production AI agents fail not on capability but on context handling. The model is rarely the constraint — modern frontier models are competent at reasoning, planning, and tool use. The fragility lives one layer up, in how state is preserved across tool calls, how context window is managed across multi-step tasks, and how the system recovers when a step returns something the agent didn't expect.
We learned this the hard way. Our first production agent for a logistics client could plan a six-step workflow in isolation but reliably forgot the goal halfway through when a tool returned a verbose error or an oversized result. The agent did not break. It quietly drifted into something adjacent to the original task and confidently reported success.
The state machine, not the prompt
The lesson: an agent is a state machine that happens to use a language model for some transitions. Treat it that way. Define the explicit task state, the allowed transitions, and the recovery paths up front. The model handles the soft work — reasoning about what to do next — but the system handles the hard work of remembering what was supposed to happen and refusing to drift.
An agent without explicit state is a search algorithm pretending to be a workflow.
In practice this means two things. First, summarise aggressively. Each tool call should append a short structured summary to the agent's memory, not the raw result. The model never re-reads a thousand-token API response — it reads a sentence describing what happened. Second, plan-then-execute. Have the agent commit to a plan, write it down, and check against the plan after every step. If the plan and the action diverge, halt and surface the discrepancy.
Guardrails that fail visibly
The other principle: guardrails should fail visibly, not silently. When an agent attempts something out-of-scope, the right response is not to wave it through with a placeholder — it is to halt and ask. We instrument every agent with a small number of explicit failure modes (out-of-scope, ambiguous instruction, contradictory state) and route them to a human queue. This costs you a small amount of human time and saves you the much larger cost of an agent that confidently completes the wrong task.
None of this is exotic. It is roughly the same engineering discipline you'd apply to any distributed system with unreliable workers and partial failures. The novelty is treating the language model as one more unreliable worker rather than as the system itself.
More from the studio.
Why we deliver written scopes before any code
Fixed-price contracts beat time-and-materials for both sides. Here is how we structure discovery to make them work.
Read article →Integration is product
The unsexy middleware between your CRM, ERP, and inbox is often the single highest-leverage thing a software team can build.
Read article →Want to talk about your project?
Tell us what you’re working on. We’ll respond within a business day.