HTB Certified Active Directory Pentesting Expert is live! (25% OFF on Gold Annual Plan — for a limited time!) Learn More

Introduction to Deserialization Attacks

In this module, we will explore deserialization attacks with specific examples in Python and PHP.

4.85

Created by bmdyy

Hard Offensive

Summary

Serialization is the process of converting an object from memory into a stream of bytes that may be stored and restored later on. This module focuses on deserialization attacks, which may occur when programmers are not careful with how / what the program deserializes, leading to consequences as severe as remote code execution.

This module is split up into the following five focus areas:

  1. Introduction: Serialization and deserialization attacks (general) are explained in depth
  2. Exploiting PHP Deserialization: We walk through identifying and exploiting a PHP website vulnerable to multiple deserialization attacks.
  3. Exploiting Python Deserialization: We walk through identifying and exploiting a Python website vulnerable to multiple deserialization attacks.
  4. Defending against Deserialization Attacks: We walk through patching the Python website against deserialization attacks and then completely preventing deserialization attacks in the PHP website, where the techniques in both sections may be applied in general to any website.
  5. Skills Assessment: You will have to identify and exploit deserialization vulnerabilities in two custom-built websites (Python and PHP).

After completing this module, you should be comfortable identifying, exploiting, and remediating (relatively simple) deserialization vulnerabilities.

This module is broken into sections with accompanying hands-on exercises to practice each of the tactics and techniques we cover. The module ends with a practical hands-on skills assessment to gauge your understanding of the various topic areas.

You can start and stop the module at any time and pick up where you left off. There is no time limit or "grading," but you must complete all of the exercises and the skills assessment to receive the maximum number of cubes and have this module marked as complete in any paths you have chosen.

As you work through the module, you will see example commands and command output for the various topics introduced. It is worth reproducing as many of these examples as possible to reinforce further the concepts presented in each section. You can do this in the PwnBox provided in the interactive sections or your virtual machine.

A firm grasp of the following modules can be considered a prerequisite for the successful completion of this module:

  • Introduction to Python3

Introduction to Serialization


Introduction

Serialization is the process of taking an object from memory and converting it into a series of bytes so that it can be stored or transmitted over a network and then reconstructed later on, perhaps by a different program or in a different machine environment.

Deserialization is the reverse action: taking serialized data and reconstructing the original object in memory.

Many object-oriented programming languages support serialization natively, including, but not limited to:

  • Java
  • Ruby
  • Python
  • PHP
  • C#

For the duration of this module, we will only focus on Python and PHP; however, please note that the same concepts taught may be reapplied to most, if not all, languages that support serialization.


PHP Serialization

As an example, this is how we would serialize an array in PHP:

[!bash!]$ php -a

Interactive shell

php > $original_data = array("HTB", 123, 7.77);
php > $serialized_data = serialize($original_data);
php > echo $serialized_data;
a:3:{i:0;s:3:"HTB";i:1;i:123;i:2;d:7.77;}
php > $reconstructed_data = unserialize($serialized_data);
php > var_dump($reconstructed_data);
array(3) {
  [0]=>
  string(3) "HTB"
  [1]=>
  int(123)
  [2]=>
  float(7.77)
}

As you can see, $original_data is an array containing one string ("HTB"), one integer (123), and one double (7.77). Using the serialize function, the array is turned into bytes that represent the array. We carry on to unserialize this serialized string and restore the original array as verified by the var_dump of $reconstructed_data.

Serialized objects in PHP are easy to read, unlike serialized objects in many other languages, which may look like complete gibberish to the human eye, as you will see in the Python example, but before that, let's understand what the letters and numbers in the serialized data mean:

a:3:{ // (A)rray with (3) items
    i:0;s:3: "HTB"; // (I)ndex (0); (S)tring with length (3) and value: "HTB"
    i:1;i:123; // (I)ndex (1); (I)nteger with value (123)
    i:2;d:7.77; // (I)ndex (2); (D)ouble with value (7.77)
}

Python Serialization

Similar to the PHP example above, we will serialize an array in Python. There are multiple libraries for Python which implement serialization, such as PyYAML and JSONpickle. However, Pickle is the native implementation, and it is what will be used in this example.

[!bash!]$ python3

Python 3.10.7 (main, Sep  8 2022, 14:34:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> original_data = ["HTB", 123, 7.77]
>>> serialized_data = pickle.dumps(original_data)
>>> print(serialized_data)
b'\x80\x04\x95\x16\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x03HTB\x94K{G@\x1f\x14z\xe1G\xae\x14e.'
>>> reconstructed_data = pickle.loads(serialized_data)
>>> print(reconstructed_data)
['HTB', 123, 7.77]

Reading the serialized data pickle outputs is much harder than reading the output PHP provides. However, it is still possible. According to comments in the pickle library, a pickle is a program for a virtual pickle machine (PM). The PM contains a stack and a memo (long-term memory), and a pickled object is just a sequence of opcodes for the PM to execute, which will recreate an arbitrary object on the stack.

The PM's stack is a Last-In-First-Out (LIFO) data structure. You may push items onto the top of the stack, and you may pop the top object off of the stack.

Quoting from comments in the pickle library, the PM's memo is a "data structure which remembers which objects the pickler has already seen, so that shared or recursive objects are pickled by reference and not by value."

In Lib/pickle.py (Python 3.10), we can see all of the pickle opcodes defined, and by referring to them, as well as the source code for the various pickling functions, we can piece together what our serialized_data does exactly when it is passed to pickle.loads():

'\x80\x04'
# PROTO 4
# Tell the PM that we are using protocol version 4. This is the default since Python 3.8.
# Protocol versions 3-5 can not be unpickled by Python 2.x.

'\x95\x16\x00\x00\x00\x00\x00\x00\x00' 
# FRAME 16
# Essentially we are telling the PM that the serialized data is 16 bytes long.
# The argument is calculated like this: 
# `struct.pack("<Q", len(b']\x94(\x8c\x03HTB\x94K{G@\x1f\x14z\xe1G\xae\x14e.')) = b'\x16\x00\x00\x00\x00\x00\x00\x00'`.

']' 
# EMPTY_LIST
# Pushes an empty list onto the stack. 
# Eventually, we will append the items to this list after we have defined them.

'\x94' 
# MEMOIZE
# This stores the object on the top of the stack in the 'memo' which is akin to long-term memory. 
# The memo is used to keep transient objects alive during pickling. 
# In this case we are 'memozing' the empty list we just pushed onto the stack. 
# This opcode is called when pickling any of the following types:
# - __reduce__
# - bytes
# - bytearray
# - string
# - tuple
# - list
# - dict
# - set 
# - frozenset
# - global

'(' 
# MARK
# Pushes the special 'markobject' on the stack.
# This will be referred to later as the starting point for our array items.

'\x8c\x03HTB' 
# SHORT_BINUNICODE 3 HTB
# Pushes the unicode string with length 3 'HTB' onto the stack.

'\x94' 
# MEMOIZE
# We tell the PM to 'memoize' the string that we just pushed onto the stack.

'K{' 
# BININT1 {
# Pushes a 1-byte unsigned int with value 123 onto the stack. 
# '{' is the byte representation of 123 calculated as so: 
# `chr(123) = b'{'`

'G@\x1f\x14z\xe1G\xae\x14' 
# BINFLOAT @\x1f\x14z\xe1G\xae\x14
# Pushes a float with the value 7.77 onto the stack. 
# '@\x1f\x14z\xe1G\xae\x14' is the 8-byte float encoding of 7.77 which is calculated like this: 
# `struct.pack(">d", 7.77) = b'@\x1f\x14z\xe1G\xae\x14'`

'e'
# APPENDS
# We are telling the PM to extend the empty list on the stack with all items we just defined back up until the 'markobject' we defined earlier.

'.' 
# STOP
# This is how we tell the PM we are at the end of the pickle. 
# The original array `['HTB', 123, 7.77]` was recreated and now sits at the top of the stack.

Sign Up / Log In to Unlock the Module

Please Sign Up or Log In to unlock the module and access the rest of the sections.

Relevant Paths

This module progresses you towards the following Paths

Senior Web Penetration Tester

The Senior Web Penetration Tester Job Role Path is designed for individuals who aim to develop skills in identifying advanced and hard-to-find web vulnerabilities using both black box and white box techniques. This path encompasses advanced-level training in web security, web penetration testing, and secure coding concepts. It also provides a deep understanding of the application debugging, source code review, and custom exploit development aspects of web security. Equipped with the necessary theoretical background, multiple practical exercises, and a proven methodology for web vulnerability identification, students will eventually be capable of performing professional security assessments against modern and highly secure web applications, as well as effectively reporting vulnerabilities found in code or arising from logical errors.

Hard Path Sections 245 Sections
Required: 7500
Reward: +1500
Path Modules
Medium
Path Sections 15 Sections
Reward: +100
This module covers three injection attacks: XPath injection, LDAP injection, and HTML injection in PDF generation libraries. While XPath and LDAP injection vulnerabilities can lead to authentication bypasses and data exfiltration, HTML injection in PDF generation libraries can lead to Server-Side Request Forgery (SSRF), Local File Inclusion (LFI), and other common web vulnerabilities. We will cover how to identify, exploit, and prevent each of these injection attacks.
Medium
Path Sections 12 Sections
Reward: +100
In this module, we will look at exploiting NoSQL injection vulnerabilities, specifically MongoDB, with examples in Python, PHP, and Node.JS.
Medium
Path Sections 20 Sections
Reward: +100
Authentication plays an essential role in almost every web application. If a vulnerability arises in the application's authentication mechanism, it could result in unauthorized access, data loss, or potentially even remote code execution, depending on the application's functionality. This module will provide an overview of various access control methods, such as JWT, OAuth, and SAML, and potential attacks against each.
Medium
Path Sections 17 Sections
Reward: +100
Modern web browsers and applications utilize a variety of security measures to protect against CSRF and XSS vulnerabilities, rendering their exploitation more difficult. This module focuses on exploiting advanced CSRF and XSS vulnerabilities, identifying and bypassing weak and wrongly implemented defensive mechanisms.
Medium
Path Sections 15 Sections
Reward: +100
This module covers details on Transport Layer Security (TLS) and how it helps to make HTTP secure with the widely used HTTPS. That includes how TLS works, how TLS sessions are established, common TLS misconfigurations, as well as famous attacks on TLS. We will discuss how to identify, exploit, and prevent TLS attacks.
Hard
Path Sections 20 Sections
Reward: +100
This module covers three common HTTP vulnerabilities: Web Cache Poisoning, Host Header Vulnerabilities, and Session Puzzling or Session Variable Overloading. These vulnerabilities can arise on the HTTP level due to web server misconfigurations, other systems that have to be considered during real-world deployment such as web caches, or coding mistakes in the web application. We will cover how to identify, exploit, and prevent each of these vulnerabilities.
Hard
Path Sections 18 Sections
Reward: +100
This module covers three HTTP vulnerabilities: CRLF Injection, HTTP Request Smuggling, and HTTP/2 Downgrading. These vulnerabilities can arise on the HTTP level in real-world deployment settings utilizing intermediary systems such as reverse proxies in front of the web server. We will cover how to identify, exploit, and prevent each of these vulnerabilities.
Hard
Path Sections 16 Sections
Reward: +100
In this module, we cover blind SQL injection attacks and MSSQL-specific attacks.
Hard
Path Sections 18 Sections
Reward: +100
Whitebox penetration testing enables thorough testing to identify various hard-to-find vulnerabilities. This module covers the process of whitebox pentesting and follows that with a practical demo by exploiting an advanced code injection vulnerability.
Hard
Path Sections 18 Sections
Reward: +100
This module covers advanced web concepts and exploitation techniques, including performing DNS Rebinding to bypass faulty SSRF filters and the Same-Origin Policy, identifying and exploiting Second-Order vulnerabilities, and conducting common web attacks via WebSocket connections.
Hard
Path Sections 15 Sections
Reward: +100
In this module, we will explore deserialization attacks with specific examples in Python and PHP.
Hard
Path Sections 15 Sections
Reward: +100
This module explores several web vulnerabilities from a whitebox approach: Prototype Pollution, Timing Attacks & Race Conditions, and those arising from Type Juggling. We will discuss how to identify, exploit, and prevent each vulnerability.
Hard
Path Sections 12 Sections
Reward: +100
This module covers advanced SQL injection techniques with a focus on white-box testing, Java/Spring and PostgreSQL.
Hard
Path Sections 13 Sections
Reward: +100
This module focuses on developing custom exploits for .NET deserialization vulnerabilities from a whitebox perspective.
Hard
Path Sections 21 Sections
Reward: +100
This 'secure coding' module teaches how to identify logic bugs through code review and analysis, and covers three types of logic bugs caused by user input manipulation.