Summary

To assess the security of large language models (LLMs), we need to understand common strategies for using LLMs in malicious or illegal behavior. These types of attacks are called prompt injection because we interact with them through prompts. In this module, we will explore different prompt injection techniques and their security impact, including direct prompt injection, indirect prompt injection, and jailbreaking.

In more detail, this module covers the following:

Direct Prompt Injection Techniques
Indirect Prompt Injection Techniques
Jailbreak Techniques
Prompt Injection Mitigations

This module is broken into sections with accompanying hands-on exercises to practice each of the tactics and techniques we cover. The module ends with a practical hands-on skills assessment to gauge your understanding of the various topic areas.

You can start and stop the module at any time and pick up where you left off. There is no time limit or "grading", but you must complete all of the exercises and the skills assessment to receive the maximum number of cubes and have this module marked as complete in any paths you have chosen.

A firm grasp of the following modules can be considered a prerequisite for the successful completion of this module:

Fundamentals of AI
Applications of AI in InfoSec
Introduction to Red Teaming AI

Introduction to Prompt Engineering

As we have established in the Fundamentals of AI module, Large Language Models (LLMs) generate text based on an initial input. They can range from answers to questions and content creation to solving complex problems. The quality and specificity of the input prompt directly influence the relevance, accuracy, and creativity of the model's response. This input is typically called the prompt. A well-engineered prompt often includes clear instructions, contextual details, and constraints to guide the AI's behavior, ensuring the output aligns with the user's needs.

Prompt Engineering

Prompt Engineering refers to designing the LLM's input prompt so that the desired LLM output is generated. Since the prompt is an LLM's only text-based input, prompt engineering is the only way to steer the generated output in the desired direction and influence the model to behave as we want it to. Applying good prompt engineering techniques reduces misinformation and increases usability in an LLM response.

Prompt engineering comprises the instructions itself that are fed to the model. For instance, a prompt like Write a short paragraph about HackTheBox Academy will produce a vastly different response than Write a short poem about HackTheBox Academy. However, prompt engineering also includes many nuances of the prompt, such as phrasing, clarity, context, and tone. The LLM might generate an entirely different response depending on the nuances of the prompt. Depending on the quality of the responses, we can introduce subtle changes to these nuances in the prompt to nudge the model to generate the responses we want. On top of that, it is important to keep in mind that LLMs are not deterministic. As such, the same prompt may result in different responses each time.

While prompt engineering is typically very problem-specific, some general prompt engineering best practices should be followed when writing an LLM prompt:

Clarity: Be as clear, unambiguous, and concise as possible to avoid the LLM misinterpreting the prompt or generating vague responses. Provide a sufficient level of detail. For instance, How do I get all table names in a MySQL database instead of How do I get all table names in SQL.
Context and Constraints: Provide as much context as possible for the prompt. If you want to add constraints to the response, add them to the prompt and add examples if possible. For instance, Provide a CSV-formatted list of OWASP Top 10 web vulnerabilities, including the columns 'position','name','description' instead of Provide a list of OWASP Top 10 web vulnerabilities.
Experimentation: As stated above, subtle changes can significantly affect response quality. Try experimenting with subtle changes in the prompt, note the resulting response quality, and stick with the prompt that produces the best quality.

Recap: OWASP LLM Top 10 & Google SAIF

Before diving into concrete attack techniques, let us take a moment and recap where security vulnerabilities resulting from improper prompt engineering are situated in OWASP's Top 10 for LLM Applications. In this module, we will explore attack techniques for LLM01:2025 Prompt Injection and LLM02:2025 Sensitive Information Disclosure. LLM02 refers to any security vulnerability resulting in the leakage of sensitive information. We will focus on types of information disclosure resulting from improper prompt engineering or manipulation of the input prompt. Furthermore, LLM01 more generally refers to security vulnerabilities arising from manipulating an LLM's input prompt, including forcing the LLM to behave unintendedly.

In Google's Secure AI Framework (SAIF), which gives broader guidance on how to build secure AI systems resilient to threats, the attacks we will discuss in this module fall under the Prompt Injection and Sensitive Data Disclosure risks.

Sign Up / Log In to Unlock the Module

Please Sign Up or Log In to unlock the module and access the rest of the sections.

Sections

Introduction to Prompt Engineering PREVIEW
Introduction to Prompt Injection
Direct Prompt Injection
Indirect Prompt Injection
Introduction to Jailbreaking
Jailbreaks I
Jailbreaks II
Tools of the Trade
Traditional Prompt Injection Mitigations
LLM-based Mitigations
Skills Assessment

Relevant Paths

This module progresses you towards the following Paths

AI Red Teamer

The AI Red Teamer Job Role Path, in collaboration with Google, trains cybersecurity professionals to assess, exploit, and secure AI systems. Covering prompt injection, model privacy attacks, adversarial AI, supply chain risks, and deployment threats, it combines theory with hands-on exercises. Aligned with Google’s Secure AI Framework (SAIF), it ensures relevance to real-world AI security challenges. Learners will gain skills to manipulate model behaviors, develop AI-specific red teaming strategies, and perform offensive security testing against AI-driven applications. The path will be gradually expanded with related modules until its completion.

Hard

110 Sections

Required: 370

Reward: +90

6 Modules included

Fundamentals of AI

Medium

24 Sections

Reward: +10

This module provides a comprehensive guide to the theoretical foundations of Artificial Intelligence (AI). It covers various learning paradigms, including supervised, unsupervised, and reinforcement learning, providing a solid understanding of key algorithms and concepts.

Applications of AI in InfoSec

Medium

25 Sections

Reward: +10

This module is a practical introduction to building AI models that can be applied to various infosec domains. It covers setting up a controlled AI environment using Miniconda for package management and JupyterLab for interactive experimentation. Students will learn to handle datasets, preprocess and transform data, and implement structured workflows for tasks such as spam classification, network anomaly detection, and malware classification. Throughout the module, learners will explore essential Python libraries like Scikit-learn and PyTorch, understand effective approaches to dataset processing, and become familiar with common evaluation metrics, enabling them to navigate the entire lifecycle of AI model development and experimentation.

Introduction to Red Teaming AI

Medium

11 Sections

Reward: +10

This module provides a comprehensive introduction to the world of red teaming Artificial Intelligence (AI) and systems utilizing Machine Learning (ML) deployments. It covers an overview of common security vulnerabilities in these systems and the types of attacks that can be launched against their components.

Prompt Injection Attacks

Medium

11 Sections

Reward: +20

This module comprehensively introduces one of the most prominent attacks on large language models (LLMs): Prompt Injection. It introduces prompt injection basics and covers detailed attack vectors based on real-world vulnerability reports. Furthermore, the module touches on academic research in the fields of novel prompt injection techniques and jailbreaks.

LLM Output Attacks

Medium

14 Sections

Reward: +20

In this module, we will explore different LLM output vulnerabilities resulting from improper handling of LLM outputs and insecure LLM applications. We will also touch on LLM abuse attacks, such as hate speech campaigns and misinformation generation, with a particular focus on the detection and mitigation of these attacks.

AI Data Attacks

Hard

25 Sections

Reward: +20

This module explores the intersection of Data and Artificial Intelligence, exposing how vulnerabilities within AI data pipelines can be exploited, ultimately aiming to degrade performance, achieve specific misclassifications, or execute arbitrary code.