Summary

Neural networks can be fooled not just by spreading small changes across all inputs, but by concentrating larger modifications on a carefully selected few. While gradient-based attacks like FGSM and DeepFool focus on minimizing perturbation magnitude under various norms, they allow all features to change. In many real-world scenarios, the constraint that matters most is not how much each feature changes, but how many features can be modified at all. Sparse attacks address this by changing as few input dimensions as possible while maintaining attack effectiveness.

The sparsity budget is measured by the L0 pseudo-norm, which counts the number of coordinates that differ between adversarial and original inputs. This module provides a comprehensive exploration of techniques that generate adversarial examples under strict sparsity constraints:

Mathematical foundations of sparsity-constrained optimization, including L0 budgets, L1-induced sparsity, and saliency-based feature selection.
ElasticNet Attack (EAD), which combines L1 and L2 regularization to produce perturbations that are simultaneously sparse (few modified pixels) and smooth (bounded individual changes).
FISTA optimization for solving the non-smooth ElasticNet objective with proximal gradient descent and momentum acceleration.
Jacobian-based Saliency Map Attack (JSMA), which enforces explicit L0 budgets by iteratively modifying one or two features per step based on gradient-derived saliency scores.
Single-pixel and pairwise JSMA variants that balance attack efficiency with modification counts.

This module is broken into sections with hands-on exercises for each attack method. It concludes with a practical skills assessment to validate your understanding.

You can start and stop at any time and resume where you left off. There is no time limit or grading, but you must complete all exercises and the skills assessment to receive the maximum cubes and have the module marked as complete in any selected paths.

To ensure a smooth learning experience, the following skills are mandatory: solid Python proficiency, familiarity with Jupyter Notebooks, and understanding of neural networks, gradient computation, and optimization methods.

A firm grasp of these modules is recommended before starting:

It is HIGHLY recommended to use your own PC/Laptop for the practicals.

Introduction to Sparsity Evasion Attacks

Sparsity attacks seek misclassification by changing as few input dimensions as possible. The sparsity budget is measured by the $L_0$ pseudo‑norm, defined as the number of coordinates that differ between an adversarial input and the original,

$\lVert x_{adv} - x \rVert_0 = \left|\{ i \mid (x_{adv})_i \ne x_i \}\right|.$

Instead of spreading small changes over many features, these attacks concentrate edits on a small set of high‑impact features. This section extends the first‑order perspective to settings where the primary constraint is how many features may change, not how small each change must be.

From First‑Order to Sparsity

The previous module used gradients to move an input across a decision boundary under $L_\infty$ or $L_2$ limits. Those norms penalize the size of a perturbation but allow all features to move. In many systems, the attack surface is discrete or partially discrete, for example pixels that can saturate to bounds or tokens that change one at a time, so controlling the number of edited features is the relevant constraint. Sparsity attacks keep the feature count small, which preserves most of the input unchanged and can evade simple anomaly detectors that focus on global noise levels.

Threat Model and Budgets

We consider inference‑time attackers who can compute or approximate gradients. In a white‑box setting the attacker evaluates derivatives through the model and uses them to select which features to edit. In a black‑box setting the attacker estimates importance scores by queries, or transfers sparse patterns from a surrogate. The primary budget is $L_0$ , sometimes with auxiliary limits on $L_2$ or $L_\infty$ to keep edits bounded and valid. Inputs remain in [0,1] for images after each update, and if the model uses normalization $\hat{x} = (x - \mu)/\sigma$ , gradients propagate through it by the chain rule, so reasoning in pixel space remains correct while respecting box constraints.

Two Paths to Sparse Perturbations

ElasticNet (EAD) promotes sparsity by adding an $L_1$ penalty to the optimization. The $L_1$ term encourages many coordinates of the perturbation to be exactly zero, which approximates an $L_0$ goal while remaining continuous. A common objective is

$\min_{x'}\; c\, f(x') + \lVert x' - x \rVert_2^2 + \beta\, \lVert x' - x \rVert_1 \quad \text{s.t.}\; x' \in [0,1],$

where $f(x')$ is a loss that enforces misclassification (often targeted), $c$ balances attack success with compactness, and $\beta$ controls sparsity through the $L_1$ term. The result is a small set of larger edits rather than many tiny ones.

Jacobian‑based Saliency Map Attack (JSMA) enforces an explicit $L_0$ budget by modifying one or two features per iteration using a saliency map derived from the input Jacobian to score candidates that raise the target while suppressing competitors.

Why Sparsity Attacks Matter

Sparse edits align with real constraints. An attacker may only be able to flip a few bits in a binary, touch a handful of pixels due to rendering limits, or change a small number of tokens in text. Sparse perturbations can be harder to detect with defenses tuned to global noise statistics, and they reveal which features the model treats as most decisive. For defenders, reproducing EAD and JSMA establishes baselines for $L_1$ ‑induced sparsity and explicit $L_0$ control, which together expose different failure modes than $L_\infty$ or $L_2$ attacks.

Sign Up / Log In to Unlock the Module

Please Sign Up or Log In to unlock the module and access the rest of the sections.

Sections

Introduction to Sparsity Evasion Attacks PREVIEW
ElasticNet
Environment Setup
Proximal Operators and FISTA Framework
Distance Metrics
Adversarial Loss
Implementing FISTA Components
Loss Gradients and Optimization
Complete FISTA Iteration and Binary Search
Attack Execution and Performance
Attack Visualizations
Sparsity Analysis
ElasticNet Challenge
JSMA Fundamentals
Jacobian and Gradients
Saliency and Search Space
Single-Pixel Attack Fundamentals
Single-Pixel Attack Loop
Single-Pixel Batch Evaluation
Single-Pixel Configuration
Pairwise Saliency
Pairwise Attack Loop
Pairwise Batch Evaluation
Pairwise Analysis
Visualizations
Aggregate Analysis
JSMA Challenge
Skills Assessment

Relevant Paths

This module progresses you towards the following Paths

AI Red Teamer

The AI Red Teamer Job Role Path, in collaboration with Google, trains cybersecurity professionals to assess, exploit, and secure AI systems. Covering prompt injection, model privacy attacks, adversarial AI, supply chain risks, and deployment threats, it combines theory with hands-on exercises. Aligned with Google’s Secure AI Framework (SAIF), it ensures relevance to real-world AI security challenges. Learners will gain skills to manipulate model behaviors, develop AI-specific red teaming strategies, and perform offensive security testing against AI-driven applications.

Hard

230 Sections

Required: 970

Reward: +210

12 Modules included

Fundamentals of AI

Medium

24 Sections

Reward: +10

This module provides a comprehensive guide to the theoretical foundations of Artificial Intelligence (AI). It covers various learning paradigms, including supervised, unsupervised, and reinforcement learning, providing a solid understanding of key algorithms and concepts.

Applications of AI in InfoSec

Medium

25 Sections

Reward: +10

This module is a practical introduction to building AI models that can be applied to various infosec domains. It covers setting up a controlled AI environment using Miniconda for package management and JupyterLab for interactive experimentation. Students will learn to handle datasets, preprocess and transform data, and implement structured workflows for tasks such as spam classification, network anomaly detection, and malware classification. Throughout the module, learners will explore essential Python libraries like Scikit-learn and PyTorch, understand effective approaches to dataset processing, and become familiar with common evaluation metrics, enabling them to navigate the entire lifecycle of AI model development and experimentation.

Introduction to Red Teaming AI

Medium

11 Sections

Reward: +10

This module provides a comprehensive introduction to the world of red teaming Artificial Intelligence (AI) and systems utilizing Machine Learning (ML) deployments. It covers an overview of common security vulnerabilities in these systems and the types of attacks that can be launched against their components.

Prompt Injection Attacks

Medium

12 Sections

Reward: +20

This module comprehensively introduces one of the most prominent attacks on large language models (LLMs): Prompt Injection. It introduces prompt injection basics and covers detailed attack vectors based on real-world vulnerability reports. Furthermore, the module touches on academic research in the fields of novel prompt injection techniques and jailbreaks.

LLM Output Attacks

Medium

14 Sections

Reward: +20

In this module, we will explore different LLM output vulnerabilities resulting from improper handling of LLM outputs and insecure LLM applications. We will also touch on LLM abuse attacks, such as hate speech campaigns and misinformation generation, with a particular focus on the detection and mitigation of these attacks.

AI Data Attacks

Hard

25 Sections

Reward: +20

This module explores the intersection of Data and Artificial Intelligence, exposing how vulnerabilities within AI data pipelines can be exploited, ultimately aiming to degrade performance, achieve specific misclassifications, or execute arbitrary code.

Attacking AI - Application and System

Medium

14 Sections

Reward: +20

In this module, we will explore security vulnerabilities in the application and system components of AI deployments. We will also discuss the Model Context Protocol (MCP), an orchestration protocol for AI deployments introduced in 2024, including a deep dive into how the protocol works and how security vulnerabilities may arise.

AI Evasion - Foundations

Medium

12 Sections

Reward: +20

This module explores the foundations of inference‑time evasion attacks against AI models, showing how to manipulate inputs to bypass classifiers and force targeted misclassifications in white‑ and black‑box settings.

AI Evasion - First-Order Attacks

Hard

23 Sections

Reward: +20

This module explores gradient-based adversarial attacks that manipulate neural network inputs at inference time, showing how to craft perturbations that cause misclassification through white-box access to model gradients.

AI Evasion - Sparsity Attacks

Hard

28 Sections

Reward: +20

This module explores sparsity-constrained adversarial attacks that minimize the number of modified input features rather than perturbation magnitude, showing how to craft targeted misclassifications by changing only the most impactful pixels through L0-focused optimization and saliency-guided feature selection.

AI Privacy

Medium

21 Sections

Reward: +20 NEW

This module explores privacy attacks against machine learning models and the differential privacy defenses that protect models from such attacks.

AI Defense

Medium

21 Sections

Reward: +20 NEW

In this module, we will explore how to defend AI applications from the attack vectors discussed in the AI Red Teamer path. We will examine adversarial training, adversarial tuning, and LLM guardrails, including the fundamental concepts and practical implementation of these defensive measures.