New SOC Analyst job-role path Learn More

Introduction to Deserialization Attacks

In this module, we will explore deserialization attacks with specific examples in Python and PHP.


Created by bmdyy

Hard Offensive


Serialization is the process of converting an object from memory into a stream of bytes that may be stored and restored later on. This module focuses on deserialization attacks, which may occur when programmers are not careful with how / what the program deserializes, leading to consequences as severe as remote code execution.

This module is split up into the following five focus areas:

  1. Introduction: Serialization and deserialization attacks (general) are explained in depth
  2. Exploiting PHP Deserialization: We walk through identifying and exploiting a PHP website vulnerable to multiple deserialization attacks.
  3. Exploiting Python Deserialization: We walk through identifying and exploiting a Python website vulnerable to multiple deserialization attacks.
  4. Defending against Deserialization Attacks: We walk through patching the Python website against deserialization attacks and then completely preventing deserialization attacks in the PHP website, where the techniques in both sections may be applied in general to any website.
  5. Skills Assessment: You will have to identify and exploit deserialization vulnerabilities in two custom-built websites (Python and PHP).

After completing this module, you should be comfortable identifying, exploiting, and remediating (relatively simple) deserialization vulnerabilities.

Introduction to Serialization


Serialization is the process of taking an object from memory and converting it into a series of bytes so that it can be stored or transmitted over a network and then reconstructed later on, perhaps by a different program or in a different machine environment.

Deserialization is the reverse action: taking serialized data and reconstructing the original object in memory.

Many object-oriented programming languages support serialization natively, including, but not limited to:

  • Java
  • Ruby
  • Python
  • PHP
  • C#

For the duration of this module, we will only focus on Python and PHP; however, please note that the same concepts taught may be reapplied to most, if not all, languages that support serialization.

PHP Serialization

As an example, this is how we would serialize an array in PHP:

[!bash!]$ php -a

Interactive shell

php > $original_data = array("HTB", 123, 7.77);
php > $serialized_data = serialize($original_data);
php > echo $serialized_data;
php > $reconstructed_data = unserialize($serialized_data);
php > var_dump($reconstructed_data);
array(3) {
  string(3) "HTB"

As you can see, $original_data is an array containing one string ("HTB"), one integer (123), and one double (7.77). Using the serialize function, the array is turned into bytes that represent the array. We carry on to unserialize this serialized string and restore the original array as verified by the var_dump of $reconstructed_data.

Serialized objects in PHP are easy to read, unlike serialized objects in many other languages, which may look like complete gibberish to the human eye, as you will see in the Python example, but before that, let's understand what the letters and numbers in the serialized data mean:

a:3:{ // (A)rray with (3) items
    i:0;s:3: "HTB"; // (I)ndex (0); (S)tring with length (3) and value: "HTB"
    i:1;i:123; // (I)ndex (1); (I)nteger with value (123)
    i:2;d:7.77; // (I)ndex (2); (D)ouble with value (7.77)

Python Serialization

Similar to the PHP example above, we will serialize an array in Python. There are multiple libraries for Python which implement serialization, such as PyYAML and JSONpickle. However, Pickle is the native implementation, and it is what will be used in this example.

[!bash!]$ python3

Python 3.10.7 (main, Sep  8 2022, 14:34:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> original_data = ["HTB", 123, 7.77]
>>> serialized_data = pickle.dumps(original_data)
>>> print(serialized_data)
>>> reconstructed_data = pickle.loads(serialized_data)
>>> print(reconstructed_data)
['HTB', 123, 7.77]

Reading the serialized data pickle outputs is much harder than reading the output PHP provides. However, it is still possible. According to comments in the pickle library, a pickle is a program for a virtual pickle machine (PM). The PM contains a stack and a memo (long-term memory), and a pickled object is just a sequence of opcodes for the PM to execute, which will recreate an arbitrary object on the stack.

The PM's stack is a Last-In-First-Out (LIFO) data structure. You may push items onto the top of the stack, and you may pop the top object off of the stack.

Quoting from comments in the pickle library, the PM's memo is a "data structure which remembers which objects the pickler has already seen, so that shared or recursive objects are pickled by reference and not by value."

In Lib/ (Python 3.10), we can see all of the pickle opcodes defined, and by referring to them, as well as the source code for the various pickling functions, we can piece together what our serialized_data does exactly when it is passed to pickle.loads():

# Tell the PM that we are using protocol version 4. This is the default since Python 3.8.
# Protocol versions 3-5 can not be unpickled by Python 2.x.

# FRAME 16
# Essentially we are telling the PM that the serialized data is 16 bytes long.
# The argument is calculated like this: 
# `struct.pack("<Q", len(b']\x94(\x8c\x03HTB\x94K{G@\x1f\x14z\xe1G\xae\x14e.')) = b'\x16\x00\x00\x00\x00\x00\x00\x00'`.

# Pushes an empty list onto the stack. 
# Eventually, we will append the items to this list after we have defined them.

# This stores the object on the top of the stack in the 'memo' which is akin to long-term memory. 
# The memo is used to keep transient objects alive during pickling. 
# In this case we are 'memozing' the empty list we just pushed onto the stack. 
# This opcode is called when pickling any of the following types:
# - __reduce__
# - bytes
# - bytearray
# - string
# - tuple
# - list
# - dict
# - set 
# - frozenset
# - global

# Pushes the special 'markobject' on the stack.
# This will be referred to later as the starting point for our array items.

# Pushes the unicode string with length 3 'HTB' onto the stack.

# We tell the PM to 'memoize' the string that we just pushed onto the stack.

# Pushes a 1-byte unsigned int with value 123 onto the stack. 
# '{' is the byte representation of 123 calculated as so: 
# `chr(123) = b'{'`

# BINFLOAT @\x1f\x14z\xe1G\xae\x14
# Pushes a float with the value 7.77 onto the stack. 
# '@\x1f\x14z\xe1G\xae\x14' is the 8-byte float encoding of 7.77 which is calculated like this: 
# `struct.pack(">d", 7.77) = b'@\x1f\x14z\xe1G\xae\x14'`

# We are telling the PM to extend the empty list on the stack with all items we just defined back up until the 'markobject' we defined earlier.

# This is how we tell the PM we are at the end of the pickle. 
# The original array `['HTB', 123, 7.77]` was recreated and now sits at the top of the stack.

Sign Up / Log In to Unlock the Module

Please Sign Up or Log In to unlock the module and access the rest of the sections.