Update #6: How Microsoft Red Team AI

Red teaming is a misunderstood term. Having spent years red teaming I was intrigued to see how Microsoft approached it regarding AI, and I was pleasantly surprised!

Apr 17, 2025

Red teaming is one of those terms that gets under my skin when I see it being misused. In my line of work, ‘red teaming’ is a distinct service that approaches cyber security challenges in a unique way. However, I see the term being misused both within the cyber security industry and many others. As today is not a grumble about how people use the term red teaming I’ll keep my monologue on the topic to just a paragraph or two to set the scene for what I was looking into this week.

First, let me start by saying that red teaming is not exclusively a cyber security practice, and I believe has origins in the military. However, it became popularised in recent decades in cyber security as the act of mimicking advanced persistent threats (APTs), like nation-state actors, who target organisations over a sustained period and are highly motivated. That is to say, not a 16-year old kid who has downloaded a few scripts off GitHub and is trying to find a quick win.

Red teaming often has no defined ‘scope’ unlike traditional penetration testing, but works backwards from a goal. In most cases, this is understanding what the single worst-case-scenario for an organisation looks like, and setting out to understand how easily a highly motivated attacker could achieve that. One example from the past: if you are a record company the single worst thing that might happen is that someone steals all of your unreleased music.

Red teaming uses a diverse and skilled group of hackers, social engineers, etc. to find out if, given a few weeks/months window, achieving this goal would be possible. Usually we start from a zero-knowledge perspective and try to find our way in to an organisation, then move around internally to the point that we can achieve our goal. Most importantly, we have to act and react to the environment, finding creative solutions to blockers, considering any way that the goal could be achieved, and often hitting our head against the wall until a creative solution appears.

Red teaming has had me trying to pick locks to data centres at 3am, speaking in a French accent whilst pretending to be someone’s boss on the phone, and hacking into IT service desk software just to try and find another route to my goal. It is using an adversarial mindset in its most unrestricted form, which is why I love it! If you want to hear the podcast episode that got me hooked on red teaming many moons ago, listen to this.

So when I hear people running a few automated scripts and putting the results in a dashboard and calling it ‘red teaming’ my heart rate goes up just a little bit. Even worse is when I excitedly attend a conference talk in DEFCON, Las Vegas about red teaming AI only to find it was some generic eval testing. After a chat with a colleague about this newsletter this week who mentioned a talk that Microsoft did on red teaming AI you could probably imagine my apprehension. Nevertheless, I went in with an open-mind and here are my thoughts…

Ref: www.youtube.com/watch?v=zFRn_RMSPI4&t=2282s

They start off by saying what the purpose of their red teaming is, which they say was to map out the types of threats and harms that their models face and recreate them. So far so good, as that is the first step to a red team engagement. Ultimately that question of ‘what are we trying to protect against and how do we recreate that artificially’.

However, I couldn’t help but feel that the language they used here was somewhat one-dimensional and could be equally used to explain ethical hacking as a whole, not so much the true nature of red teaming which often takes a more out the box approach. True red teaming requires you to drop preconceptions about how something could be achieved and explore the art of the possible. However, shortly after I made a mental note of this they addressed how red teaming AI may differ from other areas:

I liked this slide, as it understands where they have deviated from traditional red teaming, such as double blind vs single blind and not only emulating real-world adversaries but also benign scenarios too. But then it got even better:

This accurately captures how a diverse team is required for red teaming. If everyone has the same biases, preconceptions, mindsets and professional backgrounds then you are going to struggle to find the creative solutions you need to red team effectively. They then went on to define the things that they are most focused on:

AI App Security (traditional data exfiltration, remote code execution, etc.)
AI Usage Safety and Security (Fairness, inclusion, etc.)
AI Platform Security (Model theft, etc.)

This was a nice look in to how someone like MS is trying to protect against a much wider array of harm than just ‘can someone do a prompt injection’. They also showed the journey which had led them to this point, in which my eyes were drawn to the fact that they have had a dedicated AI red team for 7 years!

It was also nice to see that they were considering a wide array of attacks, not just the traditional prompt injection that is commonly talked about in AI security:

Then they gave their first example of what a real-world attack against MS’s AI products might look like, using a name of the attack which was new to me:

From their description it appears that this attack comes down to the same vulnerable behaviour that we see in traditional prompt injection in that the model can’t differentiate between the data flow and instruction flow. It was also nice to see some attacks on agentic solutions.

Most importantly though, they mentioned here that when they are red teaming they want to build backwards from a goal! This made me smile, but then they ticked even more boxes:

What is the impact we want to have on the system?
What risks are most likely to cause tangible business impact to MS?
What is the attack surface of the model / product / platform?
What information is available in the public domain that might assist an attacker in targeting this?

Almost every item here aligns exactly with what I consider to be a good red teaming approach, and then they moved on to tooling. Now, tooling is a tricky topic in red teaming. Every red team requires tooling, it is inherent to allowing us to do our jobs. However, tooling allows us to conduct our red teaming. Red teaming cannot be carried out with tooling alone, as tooling cannot react to environments, think outside the box, or apply an adversarial mindset. That said, they talked about their tool ‘PyRIT’.

PyRIT, or the Python Risk Identification Tool, didn’t over promise in what it’s use-case was and says that it was great for testing AI systems reliably, flexibly and at scale. Unsurprisingly, it’s core functionality is around prompt generation using their library of test cases, data sets, and prompt encoders. So far, this is similar to a good few other tools, such as Spikee which we got hands on with in previous updates. ‘Orchestrators’ are where it gets interesting though - these are autonomous agents that help to execute attacks and combine the elements of PyRIT together.

Putting this all together it looks like it could be a promising tool. I still have doubts about how this feeds in to wider red teaming, but for eval-based assessments at scale this looks like it could be a front runner. Using its autonomous agents they promise that it can solve all 7 challenges in the prompt injection game ‘Gandalf’, which is far too bold of a claim for me not to try it out in future updates.

And that brought us to the end of their talk. Overall, I was impressed by the content and MS’s ability to take the bits of red teaming that can be applied to AI, and applying them. I still think that AI security testing and red teaming seems to focus on a few parts of AI’s attack surface (namely prompt injection), and it makes me itch to get hands on thinking about what other areas of AI might become a treasure trove for attackers in the future. In traditional red teaming we’ve seen a huge evolution in attacker tradecraft, with massive shifts taking place roughly every 5-10 years.

Right now, the way we interact with AI means that a huge amount of focus is being put on aspects like prompt injection. However, as AI evolves and becomes more interconnected and autonomous the opportunity for attackers creep up. Within me I am certain that the red teaming in AI will soon look entirely different to how it does today, and I have some suspicions about where we will see the biggest changes. In the medium-term this is something that I am very keen to prove is true myself.

Thanks for reading! I am actually away on holiday this week and next so there will be no update until the week after.

Securing AI: A Learning Journey

Discussion about this post