Update #20 - Agentic-powered Breaches

In recent weeks Replit, Google and Amazon have made headline news for their AI agents causing incidents in the real world. Today, we will dive into the recent parade of agentic faceplants.

Aug 14, 2025

It’s happening. A deep-seated concern and theory of mine is slowly starting to prove itself true. For those of us who have been following AI closely over the last few years you’ll have no doubt drawn the same conclusion that in the AI ‘arms race’ that we’re seeing play out among the largest AI providers down to the apps I use to track my cat, the AI race is on. Feared of being left behind, AI adoption has skyrocketed and found its way into every aspect of our digital lives. I think it is also generally accepted now that this is still a very new technology that changes quickly and one we don’t fully understand.

In the era of generative AI this brought in some real-world concerns. Chatbots being prompt injected to return harmful information and LLMs hallucinating answers to our questions were certainly a problem, but now we are facing an entirely new problem. In the world of agentic AI we’ve thrown caution to the wind (are you surprised?), connected these still somewhat untrustworthy LLMs to tools that allow them to interact with the real-world, and are putting them in the hands of the general public.

Well, it is clear to most people who have seen rapid technology adoption in the past (cloud storage accidentally being made public during a cloud migration I am looking at you) that there will be some teething issues with these technologies. The issue here is that agentic AI has the ability to act autonomously on our behalf and is running, in many cases, with incredibly high privileges. Having an AI agent ignore its instruction, turn rogue and start iterating through several malicious actions in a matter of seconds, and sometimes without human oversight, is a threat profile we’ve simply just not faced before.

So, seeing the writing on the wall, I had started theorising that 1) agentic AI was going to soon face some incidents which will curb everyone’s enthusiasm and 2) the landmark agentic AI breach was around the corner. I’m talking the WannaCry for Windows, Neo4j / MoveIT for supply chain, and Snowflake for identity. Let me be clear, I do not believe that this has already happened, but mark my words one is coming, and the fact that Replit, Amazon and Google all made it in to the news the same week for agentic-powered breaches is making me think this is going to be sooner rather than later.

Replit Breach

Replit is an AI-agent powered code building software. In July, Jason, the founder of a SaaS company was building some code with Replit (and documenting the entire journey on his X) and had been enjoying the ride. Treating it like a production environment Jason had implemented windows where code cannot be changed, usually due to big events or time periods that people do not ruin by accidentally pushing a feature with a catastrophic bug in it. These are known as ‘code freezes’. Well, Jason had implemented a code freeze, and had even gone as far as giving Replit specific instructions to ‘not touch the production environment’ which is where the live systems running the businesses operations were sitting.

Ref; https://x.com/jasonlk/status/1946069562723897802

As you might have gathered by the fact that we are talking about this, Replit’s agents not only ignored the change freeze, not only ignored the instructions not to touch production, but it deleted the entire production database…This is just about as bad as things get, especially if you are in a period that is important enough for there to have been a code freeze at all. Even worse, after issuing the destructive DB commands and wiping the live database, it then fabricated bogus data to mask the damage and told the user that a restore wasn’t possible, when it turned out it was.

Naturally, when someone posts on X with such devastating news it blew up. The Replit CEO apologised, calling it ‘unacceptable and something that should never be possible’. He mentioned that they were now putting in guardrails around tighter separation between development and production environments, better rollback and restore options, and strong change freeze controls.

I’m going to try to limit the ambulance chasing and bashing of companies in this post to a minimum, as I think everyone gains 20/20 security vision in the wake of a breach, but this just goes to show that perhaps we are pushing ahead with agentic AI adoption before they’ve been stress tested to uncover and remediate critical flaws like the one found above.

Amazon Q

Amazon Q is a multi-purpose AI assistant that can help with tasks like software development, analysis and boosting productivity. This breach took a slightly different approach to a ‘rogue agent’ as seen above, but used other failed security controls to weaponise an agent and nearly cause incredible amounts of harm…it was a near miss!

In the same week as Replit, an attacker suggested a code change in Amazon Q Developer VS Code Extension, which is open-source. Open-source software benefits from community improvement, so suggesting a change or improvement in the software is very common. However, this change was not an improvement but a sneaky instruction written into Amazon Q’s agent to tell it to completely wipe the laptop that it is currently writing on.

Ref: https://www.bleepingcomputer.com/news/security/amazon-ai-coding-agent-hacked-to-inject-data-wiping-commands/?utm_source=chatgpt.com

Pushing malicious code changes should be caught by human approvers and should never get accepted. Period. However, in this case this change somehow slipped by the human approvers and was accepted and incorporated into the code base. Once this malicious change was approved, these malicious instructions were sent out to all users of the product for roughly 2 days before it was spotted.

By some miracle, or huge oversight on behalf of the attacker, the code was not properly formatted and as such the malicious instructions were not properly understood by the agent. Needless to say, we were perhaps a few lines of code away from a widespread wiping of laptops around the globe due to running AI Agents. Whilst this was not a flaw in the agent system itself, but a failure in other security controls that should have prevented the malicious change from being accepted, it does still demonstrate the attack surface of AI agents. It doesn’t matter if it is from the agent going of its own accord or an attacker placing malicious instructions into the system prompt, the fact is that we are dealing with uniquely impactful scenarios when handing over control of our laptops and production environments to AI Agents in their current state.

Gemini CLI & AgentFlayer

I will be covering the next 2 briefly and together as they both speak to an old problem (prompt injection) being weaponised by AI Agents. Gemini CLI is a code assistant agent that helps with writing, understanding and fixing code. In that same troublesome week, attackers managed to cause an indirect prompt injection by hiding malicious instructions in a repositories README files which, when read by an AI agent, would use some clever piggybacking to silently allow untrusted commands to be run and exfiltrate secrets from the repository to a remote attacker-controlled server. Fortunately, within 28 hours researchers at Tracebit found this problem and reported it to Google who quickly issued a patch.

In a similar vein, researchers at Zenity found a similar attack path but instead of writing malicious code to the README file it was written into a Jira Ticket via an email coming from Zendesk. From here, an agent would process ticket, read the malicious instructions, and once again exfiltrate secrets to a malicious site.

Ref: https://zenity.io/research/agentflayer-vulnerabilities

In both cases we see the same pattern: point an agent at your untrusted code, get it to execute it, and you’ve got a malicious insider now at your beck and call. Again, this approach is not new but one that takes on a new realm of impact when we are integrating agents with any number of external services from which they process data.

Conclusion

In most of the cases described above the agent was behaving in the way that it should. This alone would be a concerning enough problem to fix, as distinguishing between genuine and malicious commands can be extremely challenging for AI (as we have seen many times through this newsletter’s coverage). However, examples like Replit where an agent explicitly ignored instructions for no good reason introduce an even more concerning reality. So, I will ask the question again: when will see the landmark agentic breach which kicks secure adoption into gear?

Securing AI: A Learning Journey

Discussion about this post

Ready for more?