A production agentic coding platform
Hardening the tool-execution surface of a production agentic coding platform
An autonomous coding agent that runs arbitrary tools against real repositories needed a tool-execution boundary that contained blast radius without throttling the agent.
Sandbox escape attempts in the red-team set dropped to zero, with a single-digit percentage latency cost on the tool path.
Context
The platform is an autonomous coding agent: it plans, calls tools, and executes commands against real customer repositories with little human review per step. Capability terms only here, no proprietary internals.
Problem and constraint
A tool-using agent is, by construction, a remote code execution surface pointed at itself. The hard constraint: contain blast radius from a mis-planned or adversarially-induced tool call without adding latency that would make the agent feel sluggish, and without a allowlist so tight the agent could no longer do useful work.
Approach and key decisions
- Treated every tool invocation as untrusted input, not just user text. The decision that mattered: isolate at the execution boundary, not at the prompt. Prompt-level guardrails are advisory; a process and filesystem boundary is enforceable.
- Chose a per-session isolation context over a per-call one. Per-call was cleaner in theory but the startup cost dominated the tool path in practice. Per-session amortized it and still bounded cross-task leakage.
- Made the boundary observable: every denied operation is a structured event, so the agent can reflect on a refusal instead of silently looping. This turned a security control into a planning signal.
- Trade-off accepted: a small set of legitimate operations now require an explicit, logged capability grant. Slower for those cases, auditable for all of them.
Outcome (sanitized)
Against an internal red-team set of escape and exfiltration attempts, the success rate went to zero for the tested classes. The latency added to the tool path stayed in the single-digit-percent range, below the threshold where users notice. Figures are ratios by request: raw numbers are NDA.
What I would do differently
Instrument the boundary first, design second. The per-session decision was only obvious once the per-call latency was measured. I would build the event stream before the enforcement, not alongside it.