Update #27: MCP (In)Security
As MCP continues to gather pace and adoption, so do attacks targeting the protocol. Let's take a look at the state of MCP security
Over the last few years MCP has seen widespread adoption, with MCP servers cropping up all over the place for all sorts of exciting use-cases. However, more recently I’ve seen a consistent stream of articles, breaches, and concerned discussion with clients about the security of using MCP. Several security teams that I work with are getting very worried about the unrestricted usage of MCP, and with good reason!
Today, I thought I’d dive into the security picture with regards to MCP, starting with how the protocol itself has viewed security from its inception to now, and then moving on to the common attacks we are seeing against MCP and its current use.
Security At The Protocol Layer
When MCP was born it was a simple time. The priority for the protocol was ‘make agents talk to tools in a standard way’ and, as can perhaps be forgiven seeing as there was no way of knowing just how popular and dominant MCP would become, security was not really a consideration.
As such, the first 2024 version of the protocol did not even require authentication between the client and the server. Just point your client at an MCP server and shoot! Think about that - a technology which empowers an untrusted technology to talk to an untrusted and unauthenticated remote server for the purpose of executing, in most cases, high-impact actions such as remote code execution.
Furthermore, there wasn’t really anything built in to the protocol to encourage things like least privilege, access control, JIT or any of the other security standards we’ve learned to rely on. This was ultimately left up to the developers to figure out and implement on a case-by-case basis…which obviously no one did because the whole point of MCP was standardising how agents and tools talk to each other without a load of additional custom configuration.
After some fairly brutal feedback on how MCP was effectively an avenue for unauthenticated remote code execution Anthropic introduced a formal authentication hookup in March 2025 via external identity providers like Entra ID / Auth0. However, it was still optional which, when combined with the fact that the existing version didn’t support authentication, meant that many people didn’t bother.
This update was further improved upon in June 2025 by introducing proper OAuth, meaning MCP servers have a standard way of proving who they are and what they are allowed to do, including token scopes, etc. This was also accompanied by a ‘Security Best Practices’ being published on their documentation too, so things are looking up.
Prompt Injection & Data Poisoning
With an understanding of the protocol security, or lack there of, let’s take a look at 2 attacks we are seeing against MCP and break them down a bit. The first one(s) are nothing new, but existentially more dangerous in the agentic and MCP era.
First and foremost, classic prompt injection becomes much nastier under MCP. In a normal GenAI scenario, prompt injection might make the model say something silly or leak some internal context. With AI agents hooked up with MCP, prompt injection can make the model actually do things so the impact is clearly worse, but its not just that.
Here it is key to remember what is required for a prompt injection - malicious user input being ingested by models and then acted upon. Well, when we are talking about traditional genAI the avenues that malicious data may make its way into our models comprised of chatbots, document summarisers or other typical use-cases. These are by no means trivial, but the breadth of avenues was typically limited to just a handful.
Now, in comes MCP and suddenly we’re connecting our AI agents to innumerable different external services, remote databases, untrusted tools, and much more. As you can probably imagine, this makes the exposure to potentially malicious data shoot through the roof, and the potential attack surface for prompt injection sky rockets.
In the example above we’re seeing the action being to download and execute a malicious bash file. In the real world we’re seeing actions like writing secrets / variables / sensitive files to locations where the agent can then read and exfiltrate, but really this could be any number of different actions:
Exfiltrate secrets the agent can see
Spin up cloud infra
Add a new privileged user
Send sensitive output to an attacker channel
In a similar vein to prompt injection we have recently seen the potential risk in data poisoning after some new research. Whilst MCP itself is unlike to perpetrate data poisoning attacks itself, it may well introduce avenues to activate the trigger phrases that poison the agents. This is a multi-step attack, but certainly not outside the realms of possibility.
Malicious / Compromised MCP Server
Next up is malicious MCP servers, which is where the MCP server we are interacting with is either entirely malicious or has been compromised. Attackers have started publishing (or quietly modifying) MCP servers so they look legitimate but then quietly exfiltrate data or run additional actions.
We’ve already seen one popular email-integration MCP server ship an update that silently BCC’d all outbound email, leaking internal comms and attachments to an attacker-controlled location. This came after several benign versions of the tool were released, introducing an attack we’re now starting to call a ‘rug pull’.
A rug pull is often used in the crypto currency world when a new crypto currency is hugely hyped (usually by people with influence who own a massive amount of the coin) so that its price is artificially inflated for a short period and then is sold during this peak to make a quick buck.
The idea here is the same - a tool can start clean, gain trust, then ship a later version that quietly pivots to data theft or persistence in the environment. This is already being discussed as a realistic path for persistence in AI-enabled environments.
One of the interesting aspects about malicious MCP servers is how they portray what they do to external users, and how that doesn’t necessarily match what they actually do behind the scenes. In fact the description of what a tool does and what the server is actually doing in the backend can be entirely different - see this attack in action here.
This is possible because on the client side all we decide is what tools to use and what parameters, etc. Our models aren’t actually the ones executing the function, they are just deciding if we need to use a tool and if so, calling it. We can see the description of these tools exposed via the server, but there is no guarantee that the backend function is what we say it is - you should treat MCP servers exactly like you treat browser extensions with full mailbox or cloud access: high value, high blast radius, high supply chain risk.


