{"version":"https://jsonfeed.org/version/1.1","title":"Aghoghomena Akasukpe: Writing","home_page_url":"https://www.aghoghomena.com/writing","feed_url":"https://www.aghoghomena.com/feed.json","description":"Building and breaking agent systems.","authors":[{"name":"Aghoghomena Akasukpe"}],"language":"en","items":[{"id":"https://www.aghoghomena.com/writing/what-red-teaming-an-agent-looks-like","url":"https://www.aghoghomena.com/writing/what-red-teaming-an-agent-looks-like","title":"What red-teaming an agent actually looks like","summary":"Not a jailbreak leaderboard. A fixed clock, a severity-weighted plan, and findings that ship as deterministic reproductions.","content_html":"<p>A useful agent red-team is not a contest to find the funniest jailbreak.\nIt is an engineering engagement with a clock and a budget, and the output\nis fixes, not screenshots.</p>\n<h2>How I run one</h2>\n<p><strong>Scope as a system, not a chatbot.</strong> The first hour is spent mapping the\nreal surface: tool inputs, tool outputs fed back into context, ambient\ncontent the agent reads, and the reflection step. If I am only testing the\nchat box, I am testing the least valuable thing.</p>\n<p><strong>Spend the budget by severity, not by count.</strong> Twenty low findings read\nwell in a report and change nothing. One reproducible planning-loop hijack\nchanges the architecture. I optimize for severity-weighted coverage and\nsay so up front.</p>\n<p><strong>Every finding ships as a reproduction.</strong> Deterministic, replayable, with\na suggested boundary. A finding the team cannot replay is a finding they\nwill not fix, and an unfixed finding was not worth finding.</p>\n<h2>The honest part</h2>\n<p>The best findings usually come from the reflection step, and I usually\nreach them later than I would like. Red-teaming agents is still a young\ndiscipline. I would rather tell you that than sell you a checklist.</p>","date_published":"2026-05-14T00:00:00.000Z","tags":["red-teaming","ai-security","process"]},{"id":"https://www.aghoghomena.com/writing/threat-model-for-tool-using-agents","url":"https://www.aghoghomena.com/writing/threat-model-for-tool-using-agents","title":"A practical threat model for tool-using agents","summary":"Four surfaces, ranked by how much a single success costs you. Prompt injection is on the list. It is not at the top.","content_html":"<p>When people threat-model an agent they usually draw one box labeled\n\"prompt injection\" and stop. That is the demo-friendly attack, not the\nexpensive one. Here is the model I actually use.</p>\n<h2>The four surfaces</h2>\n<ol>\n<li><strong>Tool inputs.</strong> What the agent passes to tools. Classic injection\nlives here, and it is the best understood.</li>\n<li><strong>Tool outputs.</strong> What comes back and re-enters context. This is\nunderrated: the agent trusts tool output far more than user text, and\nthat trust is rarely earned.</li>\n<li><strong>Ambient content.</strong> Repo files, web pages, retrieved memory. Content\nthe agent reads as data but acts on as instruction.</li>\n<li><strong>The reflection step.</strong> Where the agent decides what just happened and\nwhat to do next. Compromise this and you do not need any other surface.</li>\n</ol>\n<h2>Rank by blast radius, not by likelihood</h2>\n<p>A bad tool input usually produces one bad action. A compromised\nreflection step produces a bad <em>plan</em>, and a bad plan compromises an\nentire session. So I spend the budget top-down by cost-of-success:\nreflection, then tool outputs, then ambient content, then inputs.</p>\n<p>Most teams do the opposite, because inputs are where the tooling and the\nblog posts are. That is exactly why the expensive bugs survive.</p>","date_published":"2026-05-06T00:00:00.000Z","tags":["ai-security","red-teaming","agent-systems"]},{"id":"https://www.aghoghomena.com/writing/agent-reliability-is-a-systems-problem","url":"https://www.aghoghomena.com/writing/agent-reliability-is-a-systems-problem","title":"Agent reliability is a systems problem, not a prompt problem","summary":"Most agent failures I see in production are not bad prompts. They are missing boundaries, missing observability, and a planning loop nobody can inspect.","content_html":"<p>The first thing teams try when an agent misbehaves is a better prompt. It\nis the cheapest lever and it is almost never the one that was broken.</p>\n<p>In production, the failures cluster somewhere else: a tool returns\nsomething the plan did not anticipate, that output re-enters context, and\nthree steps later the agent is confidently doing the wrong thing. No\nprompt change fixes that, because the prompt was never the problem. The\nloop was.</p>\n<h2>The three things that actually fail</h2>\n<p><strong>Boundaries.</strong> A tool-using agent is a remote code execution surface\npointed at itself. If the only thing standing between a mis-planned call\nand your filesystem is a sentence in a system prompt, you do not have a\nboundary. You have a suggestion.</p>\n<p><strong>Observability.</strong> If you cannot replay why the agent did what it did,\nstep by step, you are not debugging. You are guessing with extra latency.</p>\n<p><strong>Loop integrity.</strong> The reflection step is the highest-leverage and\nleast-watched part of most agent systems. It is where attacker-controlled\ncontent does the most damage and where teams instrument the least.</p>\n<h2>What I do instead</h2>\n<p>I treat the agent as an untrusted distributed system. Enforce at the\nexecution boundary, make every refusal a structured event the agent can\nreason about, and instrument the loop before tuning the prose. The prompt\nis the last thing I touch, not the first.</p>","date_published":"2026-04-22T00:00:00.000Z","tags":["agent-systems","reliability","architecture"]}]}