Summary

AI deployments commonly consist of four components: model, data, application, and system. We have explored security vulnerabilities in the model and data components in previous modules in the AI Red Teamer path. In this module, we will explore security vulnerabilities in the application and system components and discuss how they can affect the overall security of an AI deployment. Additionally, we will focus on the Model Context Protocol (MCP), its purpose, how it works, and how potential security issues may arise.

In more detail, this module covers the following:

Security vulnerabilities in the application component
Security vulnerabilities in the system component
Overview of MCP
Security vulnerabilities in MCP servers
Security issues resulting from malicious MCP servers

This module is broken into sections with accompanying hands-on exercises to practice each of the tactics and techniques we cover. The module ends with a practical hands-on skills assessment to gauge your understanding of the various topic areas.

You can start and stop the module at any time and pick up where you left off. There is no time limit or "grading", but you must complete all of the exercises and the skills assessment to receive the maximum number of cubes and have this module marked as complete in any paths you have chosen.

A firm grasp of the following modules can be considered a prerequisite for the successful completion of this module:

Overview of Application & System Components

As we have discussed in detail in the Introduction to Red Teaming AI module, real-world AI deployments typically consist of four distinct components:

Previous modules in the AI Red Teamer path focused on attack vectors on the model, such as the Prompt Injection Attacks and LLM Output Attacks modules, or the data, such as the AI Data Attacks module.

This module will explore security vulnerabilities in the application and system components. Let us briefly recap what these two components entail. Furthermore, we will explore the Model Context Protocol (MCP), an orchestration protocol for AI applications introduced by the AI company Anthropic in 2024.

Application Component

The application component is the interface layer connecting users to the underlying model and its capabilities. It comprises all applications that the AI deployment interacts with, including web applications, mobile apps, APIs, databases, and integrated services such as plugins and autonomous agents. Since security vulnerabilities in systems interacting with the AI ecosystem often directly impact the security of the AI deployment, it is crucial to assess the application's overall security. In the real world, generative AI is integrated into increasingly complex systems, providing a wide variety of different services. Due to the complexity of these deployments, security vulnerabilities may arise in interconnected systems or interfaces between the different integrations, potentially posing a risk for user data and model interactions.

Common application-component attacks include:

Injection attacks, such as SQL injection or command injection. Injection vulnerabilities can lead to loss of data or complete system takeover.
Access control vulnerabilities, potentially enabling unauthorized attackers to access sensitive data or functionality.
Denial of ML-Service, potentially impairing the availability of the AI deployment.
Rogue Actions: If a model has excessive agency and can access functions or data it does not necessarily need to access, the model may trigger unintended actions impacting systems it interfaces with. Such rogue actions may result from malicious actions or inadvertently occur due to unexpected model interactions. For instance, if the model can issue arbitrary SQL queries in a connected database, a model response may result in data loss if all tables are dropped. Such a query can either be issued maliciously by an attacker or accidentally caused by unexpected user input.
Model Reverse Engineering: An attacker may be able to replicate the model by analyzing inputs and outputs for a vast number of input data points. If the application does not implement a rate limit, malicious actors might frequently query the model to reverse engineer it.
Vulnerable Agents or Plugins: Vulnerabilities in custom agents or plugins integrated into the deployment may perform unintended actions or exfiltrate model interactions to malicious actors.
Logging of sensitive data: If the application logs sensitive data from user input or model interactions, sensitive information may be disclosed to unauthorized actors through application logs.

System Component

The system component encompasses everything infrastructure-related, including deployment platforms, code, data storage, and hardware. For instance, it comprises the source code for training and running the model, including frameworks, the storage of training and inference data, the storage of the model itself, and the deployment pipeline to deploy a model in production. Security vulnerabilities at this layer can cascade across the entire deployment, as breaches may lead to total system compromise, unauthorized access to models and data, or service disruption.

Common system-component vulnerabilities include:

Misconfigured Infrastructure: If infrastructure used during training or inference is misconfigured to expose data or services to the public inadvertently, unauthorized actors may be able to steal training data, user data, the model itself, or configuration secrets.
Improper Patch Management: Issues in an AI deployment application's patch management process may result in unpatched public vulnerabilities in different system components, from the operating system to the ML stack. These vulnerabilities can range from privilege escalation vectors to remote code execution flaws and may result in total system compromise.
Network Security: Since generative AI deployments typically interface with different systems over internal networks, proper network security is crucial to mitigate security threats. Common network security measures include network segmentation, encryption, and monitoring to thwart lateral movement.
Model Deployment Tampering: If threat actors manipulate the deployment process, they may be able to maliciously modify model behavior. They can achieve this by manipulating the source code or exploiting vulnerabilities.
Excessive Data Handling: Applications processing and storing data excessively may run into legal issues if this data includes user-related information. Furthermore, excessive data handling increases the impact of data storage vulnerabilities as more data is at risk of being leaked or stolen.

Model Context Protocol (MCP)

The Model Context Protocol (MCP) provides a standardized interface between LLM applications and external resources. As LLMs become increasingly capable, ensuring that these models can consistently interpret, retain, and apply context is critical for a satisfying performance. MCP addresses this need by defining a structured framework for sharing, updating, and reasoning over context information between models and their environments, such as user interfaces, external APIs, or other data providers. At its core, MCP establishes a standardized way to represent context and task-specific data.

Sign Up / Log In to Unlock the Module

Please Sign Up or Log In to unlock the module and access the rest of the sections.

Sections

Overview of Application & System Components PREVIEW
Model Reverse Engineering
Denial of ML Service
Insecure Integrated Components
Rogue Actions
Excessive Data Handling & Insecure Storage
Model Deployment Tampering
Vulnerable Framework Code
Introduction to MCP
Practical Introduction to MCP
Vulnerable MCP Servers
Malicious MCP Servers
Mitigating MCP Security Issues
Skills Assessment

Relevant Paths

This module progresses you towards the following Paths

AI Red Teamer

The AI Red Teamer Job Role Path, in collaboration with Google, trains cybersecurity professionals to assess, exploit, and secure AI systems. Covering prompt injection, model privacy attacks, adversarial AI, supply chain risks, and deployment threats, it combines theory with hands-on exercises. Aligned with Google’s Secure AI Framework (SAIF), it ensures relevance to real-world AI security challenges. Learners will gain skills to manipulate model behaviors, develop AI-specific red teaming strategies, and perform offensive security testing against AI-driven applications. The path will be gradually expanded with related modules until its completion.

Hard

187 Sections

Required: 770

Reward: +170

10 Modules included

Fundamentals of AI

Medium

24 Sections

Reward: +10

This module provides a comprehensive guide to the theoretical foundations of Artificial Intelligence (AI). It covers various learning paradigms, including supervised, unsupervised, and reinforcement learning, providing a solid understanding of key algorithms and concepts.

Applications of AI in InfoSec

Medium

25 Sections

Reward: +10

This module is a practical introduction to building AI models that can be applied to various infosec domains. It covers setting up a controlled AI environment using Miniconda for package management and JupyterLab for interactive experimentation. Students will learn to handle datasets, preprocess and transform data, and implement structured workflows for tasks such as spam classification, network anomaly detection, and malware classification. Throughout the module, learners will explore essential Python libraries like Scikit-learn and PyTorch, understand effective approaches to dataset processing, and become familiar with common evaluation metrics, enabling them to navigate the entire lifecycle of AI model development and experimentation.

Introduction to Red Teaming AI

Medium

11 Sections

Reward: +10

This module provides a comprehensive introduction to the world of red teaming Artificial Intelligence (AI) and systems utilizing Machine Learning (ML) deployments. It covers an overview of common security vulnerabilities in these systems and the types of attacks that can be launched against their components.

Prompt Injection Attacks

Medium

11 Sections

Reward: +20

This module comprehensively introduces one of the most prominent attacks on large language models (LLMs): Prompt Injection. It introduces prompt injection basics and covers detailed attack vectors based on real-world vulnerability reports. Furthermore, the module touches on academic research in the fields of novel prompt injection techniques and jailbreaks.

LLM Output Attacks

Medium

14 Sections

Reward: +20

In this module, we will explore different LLM output vulnerabilities resulting from improper handling of LLM outputs and insecure LLM applications. We will also touch on LLM abuse attacks, such as hate speech campaigns and misinformation generation, with a particular focus on the detection and mitigation of these attacks.

AI Data Attacks

Hard

25 Sections

Reward: +20

This module explores the intersection of Data and Artificial Intelligence, exposing how vulnerabilities within AI data pipelines can be exploited, ultimately aiming to degrade performance, achieve specific misclassifications, or execute arbitrary code.

Attacking AI - Application and System

Medium

14 Sections

Reward: +20

In this module, we will explore security vulnerabilities in the application and system components of AI deployments. We will also discuss the Model Context Protocol (MCP), an orchestration protocol for AI deployments introduced in 2024, including a deep dive into how the protocol works and how security vulnerabilities may arise.

AI Evasion - Foundations

Medium

12 Sections

Reward: +20

This module explores the foundations of inference‑time evasion attacks against AI models, showing how to manipulate inputs to bypass classifiers and force targeted misclassifications in white‑ and black‑box settings.

AI Evasion - First-Order Attacks

Hard

23 Sections

Reward: +20

This module explores gradient-based adversarial attacks that manipulate neural network inputs at inference time, showing how to craft perturbations that cause misclassification through white-box access to model gradients.

AI Evasion - Sparsity Attacks

Hard

28 Sections

Reward: +20 NEW

This module explores sparsity-constrained adversarial attacks that minimize the number of modified input features rather than perturbation magnitude, showing how to craft targeted misclassifications by changing only the most impactful pixels through L0-focused optimization and saliency-guided feature selection.