Summary
Assessing output security vulnerabilities is as essential in LLM applications as it is in traditional applications. These types of vulnerabilities can have a devastating impact, potentially leading to full system compromise. In this module, we will explore a variety of security vulnerabilities resulting from improper handling of LLM-generated output.
In more detail, this module covers the following:
- Cross-Site Scripting (XSS) vulnerabilities in LLM applications
- SQL injection vulnerabilities in LLM applications
- Code injection vulnerabilities in LLM applications
- Attacks against LLM function calling
- Exfiltrating information from LLM prompts
- Security issues resulting from LLM hallucinations
- LLM abuse attacks
This module is broken into sections with accompanying hands-on exercises to practice each of the tactics and techniques we cover. The module ends with a practical hands-on skills assessment to gauge your understanding of the various topic areas.
You can start and stop the module at any time and pick up where you left off. There is no time limit or "grading", but you must complete all of the exercises and the skills assessment to receive the maximum number of cubes and have this module marked as complete in any paths you have chosen.
A firm grasp of the following modules can be considered a prerequisite for the successful completion of this module:
- Fundamentals of AI
- Applications of AI in InfoSec
- Introduction to Red Teaming AI
- Prompt Injection Attacks
- Cross-Site Scripting (XSS)
- SQL Injection Fundamentals
- Command Injections
Introduction to Insecure Output Handling
Many common security vulnerabilities arise from improper handling of untrusted data. Arguably the most common attack vector is an Injection Attack
. Typical examples in the web domain include Cross-Site Scripting (XSS), where untrusted data is inserted into the HTML DOM, leading to the execution of arbitrary JavaScript code; SQL Injection, where untrusted data is inserted into SQL queries, leading to the execution of arbitrary SQL queries; and code injection, where untrusted data is inserted into system commands, leading to the execution of arbitrary system commands.
This module will only discuss output attacks against text-based models, i.e., LLMs. However, in real-world deployments it is common to interact with multimodal models that can process and generate text as well as images, audio, and video. These types of models provide additional attack surfaces for output attacks.
Insecure Output Handling in LLM Applications
Text generated by Large Language Models (LLMs)
needs to be treated as untrusted data since there is no direct control over the LLM's response. As such, the output must be subjected to the same types of validation, sanitization, and escaping that untrusted user input is subjected to. For instance, if an LLM's output is reflected in a web server's response in any endpoint, proper HTML encoding must be applied. Similarly, if we insert an LLM's output into a SQL query, we must apply preventive measures such as prepared statements or escaping.
However, insecure handling of LLM output can not only lead to injection vulnerabilities. For instance, if an LLM is used to generate an e-mail body, improper output validation may lead to malicious, illegal, or unethical content being contained in the e-mail. A company sending such an e-mail to a potential customer may suffer financial or reputational damage. Another source for potential security vulnerabilities is source code snippets generated by LLMs. If they are not adequately reviewed for bugs and security issues, vulnerabilities may unknowingly get introduced into code bases.
Recap: OWASP LLM Top 10
Before diving into concrete attack techniques, let us take a moment and recap where security vulnerabilities discussed throughout this module are situated in OWASP's Top 10 for LLM Applications. As the module name suggests, we will explore attack techniques for LLM05: 2025 Improper Output Handling
. As discussed above, this security risk refers to all instances where LLM output is not treated as untrusted data and proper sanitization, validation, or escaping is not applied. In Google's SAIF, the attack vectors discussed in this module fall under the Insecure Model Output
risk.