Agent reliability is a systems problem, not a prompt problem

The first thing teams try when an agent misbehaves is a better prompt. It is the cheapest lever and it is almost never the one that was broken.

In production, the failures cluster somewhere else: a tool returns something the plan did not anticipate, that output re-enters context, and three steps later the agent is confidently doing the wrong thing. No prompt change fixes that, because the prompt was never the problem. The loop was.

The three things that actually fail

Boundaries. A tool-using agent runs real actions against real state: files, APIs, code. If the only thing standing between a mis-planned call and your filesystem is a sentence in a system prompt, you do not have a boundary. You have a suggestion.

Observability. If you cannot replay why the agent did what it did, step by step, you are not debugging. You are guessing with extra latency.

Loop integrity. The reflection step is the highest-leverage and least-watched part of most agent systems. It is where a bad tool result does the most damage and where teams instrument the least.

What I do instead

I treat the agent as a distributed system whose inputs I don't trust blindly. Validate at the execution boundary, make every refusal a structured event the agent can reason about, and instrument the loop before tuning the prose. The prompt is the last thing I touch, not the first.