Summary

Assessing output security vulnerabilities is as essential in LLM applications as it is in traditional applications. These types of vulnerabilities can have a devastating impact, potentially leading to full system compromise. In this module, we will explore a variety of security vulnerabilities resulting from improper handling of LLM-generated output.

In more detail, this module covers the following:

Cross-Site Scripting (XSS) vulnerabilities in LLM applications
SQL injection vulnerabilities in LLM applications
Command injection vulnerabilities in LLM applications
Attacks against LLM function calling
Exfiltrating information from LLM prompts
Security issues resulting from LLM hallucinations
LLM abuse attacks

This module is broken into sections with accompanying hands-on exercises to practice each of the tactics and techniques we cover. The module ends with a practical hands-on skills assessment to gauge your understanding of the various topic areas.

You can start and stop the module at any time and pick up where you left off. There is no time limit or "grading", but you must complete all of the exercises and the skills assessment to receive the maximum number of cubes and have this module marked as complete in any paths you have chosen.

A firm grasp of the following modules can be considered a prerequisite for the successful completion of this module:

Introduction to Insecure Output Handling

Many common security vulnerabilities arise from improper handling of untrusted data. Arguably the most common attack vector is an Injection Attack. Typical examples in the web domain include Cross-Site Scripting (XSS), where untrusted data is inserted into the HTML DOM, leading to the execution of arbitrary JavaScript code; SQL Injection, where untrusted data is inserted into SQL queries, leading to the execution of arbitrary SQL queries; and code injection, where untrusted data is inserted into system commands, leading to the execution of arbitrary system commands.

This module will only discuss output attacks against text-based models, i.e., LLMs. However, in real-world deployments it is common to interact with multimodal models that can process and generate text as well as images, audio, and video. These types of models provide additional attack surfaces for output attacks.

Insecure Output Handling in LLM Applications

Text generated by Large Language Models (LLMs) needs to be treated as untrusted data since there is no direct control over the LLM's response. As such, the output must be subjected to the same types of validation, sanitization, and escaping that untrusted user input is subjected to. For instance, if an LLM's output is reflected in a web server's response in any endpoint, proper HTML encoding must be applied. Similarly, if we insert an LLM's output into a SQL query, we must apply preventive measures such as prepared statements or escaping.

However, insecure handling of LLM output can not only lead to injection vulnerabilities. For instance, if an LLM is used to generate an e-mail body, improper output validation may lead to malicious, illegal, or unethical content being contained in the e-mail. A company sending such an e-mail to a potential customer may suffer financial or reputational damage. Another source for potential security vulnerabilities is source code snippets generated by LLMs. If they are not adequately reviewed for bugs and security issues, vulnerabilities may unknowingly get introduced into code bases.

Recap: OWASP LLM Top 10

Before diving into concrete attack techniques, let us take a moment and recap where security vulnerabilities discussed throughout this module are situated in OWASP's Top 10 for LLM Applications. As the module name suggests, we will explore attack techniques for LLM05: 2025 Improper Output Handling. As discussed above, this security risk refers to all instances where LLM output is not treated as untrusted data and proper sanitization, validation, or escaping is not applied. In Google's SAIF, the attack vectors discussed in this module fall under the Insecure Model Output risk.

Sign Up / Log In to Unlock the Module

Please Sign Up or Log In to unlock the module and access the rest of the sections.

Sections

Introduction to Insecure Output Handling PREVIEW
Cross-Site Scripting (XSS)
SQL Injection
Code Injection
Function Calling
Exfiltration Attacks
LLM Hallucinations
Insecure Output Handling Mitigations
Introduction to Abuse Attacks
LLM Abuse Attacks
Mitigating Abuse Attacks
Safeguard Case Studies
Legislative Regulation
Skills Assessment

Relevant Paths

This module progresses you towards the following Paths

AI Red Teamer

The AI Red Teamer Job Role Path, in collaboration with Google, trains cybersecurity professionals to assess, exploit, and secure AI systems. Covering prompt injection, model privacy attacks, adversarial AI, supply chain risks, and deployment threats, it combines theory with hands-on exercises. Aligned with Google’s Secure AI Framework (SAIF), it ensures relevance to real-world AI security challenges. Learners will gain skills to manipulate model behaviors, develop AI-specific red teaming strategies, and perform offensive security testing against AI-driven applications. The path will be gradually expanded with related modules until its completion.

Hard

187 Sections

Required: 770

Reward: +170

10 Modules included

Fundamentals of AI

Medium

24 Sections

Reward: +10

This module provides a comprehensive guide to the theoretical foundations of Artificial Intelligence (AI). It covers various learning paradigms, including supervised, unsupervised, and reinforcement learning, providing a solid understanding of key algorithms and concepts.

Applications of AI in InfoSec

Medium

25 Sections

Reward: +10

This module is a practical introduction to building AI models that can be applied to various infosec domains. It covers setting up a controlled AI environment using Miniconda for package management and JupyterLab for interactive experimentation. Students will learn to handle datasets, preprocess and transform data, and implement structured workflows for tasks such as spam classification, network anomaly detection, and malware classification. Throughout the module, learners will explore essential Python libraries like Scikit-learn and PyTorch, understand effective approaches to dataset processing, and become familiar with common evaluation metrics, enabling them to navigate the entire lifecycle of AI model development and experimentation.

Introduction to Red Teaming AI

Medium

11 Sections

Reward: +10

This module provides a comprehensive introduction to the world of red teaming Artificial Intelligence (AI) and systems utilizing Machine Learning (ML) deployments. It covers an overview of common security vulnerabilities in these systems and the types of attacks that can be launched against their components.

Prompt Injection Attacks

Medium

11 Sections

Reward: +20

This module comprehensively introduces one of the most prominent attacks on large language models (LLMs): Prompt Injection. It introduces prompt injection basics and covers detailed attack vectors based on real-world vulnerability reports. Furthermore, the module touches on academic research in the fields of novel prompt injection techniques and jailbreaks.

LLM Output Attacks

Medium

14 Sections

Reward: +20

In this module, we will explore different LLM output vulnerabilities resulting from improper handling of LLM outputs and insecure LLM applications. We will also touch on LLM abuse attacks, such as hate speech campaigns and misinformation generation, with a particular focus on the detection and mitigation of these attacks.

AI Data Attacks

Hard

25 Sections

Reward: +20

This module explores the intersection of Data and Artificial Intelligence, exposing how vulnerabilities within AI data pipelines can be exploited, ultimately aiming to degrade performance, achieve specific misclassifications, or execute arbitrary code.

Attacking AI - Application and System

Medium

14 Sections

Reward: +20

In this module, we will explore security vulnerabilities in the application and system components of AI deployments. We will also discuss the Model Context Protocol (MCP), an orchestration protocol for AI deployments introduced in 2024, including a deep dive into how the protocol works and how security vulnerabilities may arise.

AI Evasion - Foundations

Medium

12 Sections

Reward: +20 NEW

This module explores the foundations of inference‑time evasion attacks against AI models, showing how to manipulate inputs to bypass classifiers and force targeted misclassifications in white‑ and black‑box settings.

AI Evasion - First-Order Attacks

Hard

23 Sections

Reward: +20 NEW

This module explores gradient-based adversarial attacks that manipulate neural network inputs at inference time, showing how to craft perturbations that cause misclassification through white-box access to model gradients.

AI Evasion - Sparsity Attacks

Hard

28 Sections

Reward: +20 NEW

This module explores sparsity-constrained adversarial attacks that minimize the number of modified input features rather than perturbation magnitude, showing how to craft targeted misclassifications by changing only the most impactful pixels through L0-focused optimization and saliency-guided feature selection.