Summary

Cybersecurity professionals rely on a well-structured AI environment and streamlined workflows to efficiently process data, build models, and extract insights. This module provides a direct path to establishing and optimizing such an environment—from installing afamiliarnd managing packages with Miniconda to leveraging JupyterLab for interactive development and using libraries like Scikit-learn and PyTorch for model training and evaluation—ensuring students can move seamlessly from raw data to actionable models.

While this module offers an accompanying VM to solve the labs, its performance is limited and may result in longer training times. Therefore, we recommend setting up your personal environment on your own machine, which requires at least 4GB of RAM. Additionally, training benefits from GPU utilization; however, training on a CPU is also possible. We recommend a reasonably modern CPU with as many cores as possible for a decent training performance. In a majority of cases, your own environment will provide faster training times than the accompanying VM.

Key areas covered include:

Environment Setup: Establishing a dedicated AI environment using Miniconda for dependency management.
JupyterLab: Leveraging an interactive and flexible development platform for exploratory data analysis, rapid prototyping, and in-depth experimentation.
Python Libraries for AI: Applying Scikit-learn and PyTorch to model training, evaluation, and continuous improvement.
Datasets: Understanding key attributes of datasets, exploring their structure, identifying challenges, and learning how to load and inspect data to detect potential issues.
Data Preprocessing: Implementing rigorous methods to clean and refine data, including identifying invalid values, imputing missing entries, encoding categorical features, and handling skewed distributions.
Data Transformation: Applying transformations like one-hot encoding and data splitting to prepare data for downstream modeling tasks.
Spam Classification: Using Naive Bayes to translate raw text into representative numerical features for effective classification.
Network Anomaly Detection: Using random forests and specialized datasets like NSL-KDD to detect abnormal network behavior.
Malware Classification: Transforming malware samples into representational data (e.g., images) and using deep learning models like ResNet50 to classify malicious binaries, reinforcing complex feature extraction and model training techniques.

This module is broken into sections with accompanying hands-on exercises to practice each of the tactics and techniques we cover. The module ends with a practical hands-on skills assessment to gauge your understanding of the various topic areas.

You can start and stop the module at any time and pick up where you left off. There is no time limit or "grading," but you must complete all of the exercises and the skills assessment to receive the maximum number of cubes and have this module marked as complete in any paths you have chosen.

As you work through the module, you will see example commands and command output for the various topics introduced. It is worth reproducing as many of these examples as possible to reinforce further the concepts presented in each section. You can do this in the PwnBox provided in the interactive sections or your virtual machine.

Introduction

Following the Fundamentals of AI module, this module takes a more practical approach to applying machine learning techniques. Instead of focusing solely on theory, you will now engage in hands-on activities that involve building and evaluating real models. Throughout this process, you will gain experience with the end-to-end workflow of AI development, from exploring datasets to training and testing models.

You will construct three distinct AI models in this module:

A Spam Classifier to determine whether an SMS message is spam or not.
A Network Anomaly Detection Model designed to identify abnormal or potentially malicious network traffic.
A Malware Classifier using byteplots, which are visual representations of binary data.

Throughout the module, you will encounter python code blocks that guide you step-by-step through the model-building process.

You will learn more about Jupyter later in this module, but for now, understand that you can copy and paste these code snippets into a Jupyter notebook to execute them in sequence, either in the playground VM, or your environment.

You can train most of these models in your own environment. For a decent experience, you will need at least 4GB of RAM and at least 4 CPU cores.

Note: Throughout this module, all sections marked as interactive contain code blocks for you to follow along. Not all interactive sections contain separate exercises.

Sign Up / Log In to Unlock the Module

Please Sign Up or Log In to unlock the module and access the rest of the sections.

Sections

Introduction PREVIEW
Environment Setup
JupyterLab
Python Libraries for AI
Datasets
Data Preprocessing
Data Transformation
Metrics for Evaluating a Model
Spam Classification
The Spam Dataset
Preprocessing the Spam Dataset
Feature Extraction
Training and Evaluation (Spam Detection)
Model Evaluation (Spam Detection)
Network Anomaly Detection
Preprocessing and Splitting the Dataset
Training and Evaluation (Network Anomaly Detection)
Model Evaluation (Network Anomaly Detection)
Malware Classification
The Malware Dataset
Preprocessing the Malware Dataset
The Model
Training and Evaluation (Malware Image Classification)
Model Evaluation (Malware Image Classification)
Skills Assessment

Relevant Paths

This module progresses you towards the following Paths

AI Red Teamer

The AI Red Teamer Job Role Path, in collaboration with Google, trains cybersecurity professionals to assess, exploit, and secure AI systems. Covering prompt injection, model privacy attacks, adversarial AI, supply chain risks, and deployment threats, it combines theory with hands-on exercises. Aligned with Google’s Secure AI Framework (SAIF), it ensures relevance to real-world AI security challenges. Learners will gain skills to manipulate model behaviors, develop AI-specific red teaming strategies, and perform offensive security testing against AI-driven applications. The path will be gradually expanded with related modules until its completion.

Hard

110 Sections

Required: 370

Reward: +90

6 Modules included

Fundamentals of AI

Medium

24 Sections

Reward: +10

This module provides a comprehensive guide to the theoretical foundations of Artificial Intelligence (AI). It covers various learning paradigms, including supervised, unsupervised, and reinforcement learning, providing a solid understanding of key algorithms and concepts.

Applications of AI in InfoSec

Medium

25 Sections

Reward: +10

This module is a practical introduction to building AI models that can be applied to various infosec domains. It covers setting up a controlled AI environment using Miniconda for package management and JupyterLab for interactive experimentation. Students will learn to handle datasets, preprocess and transform data, and implement structured workflows for tasks such as spam classification, network anomaly detection, and malware classification. Throughout the module, learners will explore essential Python libraries like Scikit-learn and PyTorch, understand effective approaches to dataset processing, and become familiar with common evaluation metrics, enabling them to navigate the entire lifecycle of AI model development and experimentation.

Introduction to Red Teaming AI

Medium

11 Sections

Reward: +10

This module provides a comprehensive introduction to the world of red teaming Artificial Intelligence (AI) and systems utilizing Machine Learning (ML) deployments. It covers an overview of common security vulnerabilities in these systems and the types of attacks that can be launched against their components.

Prompt Injection Attacks

Medium

11 Sections

Reward: +20

This module comprehensively introduces one of the most prominent attacks on large language models (LLMs): Prompt Injection. It introduces prompt injection basics and covers detailed attack vectors based on real-world vulnerability reports. Furthermore, the module touches on academic research in the fields of novel prompt injection techniques and jailbreaks.

LLM Output Attacks

Medium

14 Sections

Reward: +20

In this module, we will explore different LLM output vulnerabilities resulting from improper handling of LLM outputs and insecure LLM applications. We will also touch on LLM abuse attacks, such as hate speech campaigns and misinformation generation, with a particular focus on the detection and mitigation of these attacks.

AI Data Attacks

Hard

25 Sections

Reward: +20

This module explores the intersection of Data and Artificial Intelligence, exposing how vulnerabilities within AI data pipelines can be exploited, ultimately aiming to degrade performance, achieve specific misclassifications, or execute arbitrary code.