Summary
Serialization is the process of converting an object from memory into a stream of bytes that may be stored and restored later on. This module focuses on deserialization attacks, which may occur when programmers are not careful with how / what the program deserializes, leading to consequences as severe as remote code execution.
This module is split up into the following five focus areas:
-
Introduction
: Serialization and deserialization attacks (general) are explained in depth -
Exploiting PHP Deserialization
: We walk through identifying and exploiting a PHP website vulnerable to multiple deserialization attacks. -
Exploiting Python Deserialization
: We walk through identifying and exploiting a Python website vulnerable to multiple deserialization attacks. -
Defending against Deserialization Attacks
: We walk through patching the Python website against deserialization attacks and then completely preventing deserialization attacks in the PHP website, where the techniques in both sections may be applied in general to any website. -
Skills Assessment
: You will have to identify and exploit deserialization vulnerabilities in two custom-built websites (Python and PHP).
After completing this module, you should be comfortable identifying, exploiting, and remediating (relatively simple) deserialization vulnerabilities.
This module is broken into sections with accompanying hands-on exercises to practice each of the tactics and techniques we cover. The module ends with a practical hands-on skills assessment to gauge your understanding of the various topic areas.
You can start and stop the module at any time and pick up where you left off. There is no time limit or "grading," but you must complete all of the exercises and the skills assessment to receive the maximum number of cubes and have this module marked as complete in any paths you have chosen.
As you work through the module, you will see example commands and command output for the various topics introduced. It is worth reproducing as many of these examples as possible to reinforce further the concepts presented in each section. You can do this in the PwnBox provided in the interactive sections or your virtual machine.
A firm grasp of the following modules can be considered a prerequisite for the successful completion of this module:
- Introduction to Python3
Introduction to Serialization
Introduction
Serialization
is the process of taking an object from memory and converting it into a series of bytes so that it can be stored or transmitted over a network and then reconstructed later on, perhaps by a different program or in a different machine environment.
Deserialization
is the reverse action: taking serialized data and reconstructing the original object in memory.
Many object-oriented programming languages support serialization natively, including, but not limited to:
- Java
- Ruby
- Python
- PHP
- C#
For the duration of this module, we will only focus on Python
and PHP
; however, please note that the same concepts taught may be reapplied to most, if not all, languages that support serialization.
PHP Serialization
As an example, this is how we would serialize an array in PHP
:
[!bash!]$ php -a
Interactive shell
php > $original_data = array("HTB", 123, 7.77);
php > $serialized_data = serialize($original_data);
php > echo $serialized_data;
a:3:{i:0;s:3:"HTB";i:1;i:123;i:2;d:7.77;}
php > $reconstructed_data = unserialize($serialized_data);
php > var_dump($reconstructed_data);
array(3) {
[0]=>
string(3) "HTB"
[1]=>
int(123)
[2]=>
float(7.77)
}
As you can see, $original_data
is an array containing one string
("HTB"
), one integer
(123
), and one double
(7.77
). Using the serialize
function, the array is turned into bytes that represent the array. We carry on to unserialize
this serialized string and restore the original array as verified by the var_dump
of $reconstructed_data
.
Serialized objects in PHP are easy to read, unlike serialized objects in many other languages, which may look like complete gibberish to the human eye, as you will see in the Python example, but before that, let's understand what the letters and numbers in the serialized data mean:
a:3:{ // (A)rray with (3) items
i:0;s:3: "HTB"; // (I)ndex (0); (S)tring with length (3) and value: "HTB"
i:1;i:123; // (I)ndex (1); (I)nteger with value (123)
i:2;d:7.77; // (I)ndex (2); (D)ouble with value (7.77)
}
Python Serialization
Similar to the PHP example above, we will serialize an array in Python
. There are multiple libraries for Python which implement serialization, such as PyYAML and JSONpickle. However, Pickle is the native implementation, and it is what will be used in this example.
[!bash!]$ python3
Python 3.10.7 (main, Sep 8 2022, 14:34:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> original_data = ["HTB", 123, 7.77]
>>> serialized_data = pickle.dumps(original_data)
>>> print(serialized_data)
b'\x80\x04\x95\x16\x00\x00\x00\x00\x00\x00\x00]\x94(\x8c\x03HTB\x94K{G@\x1f\x14z\xe1G\xae\x14e.'
>>> reconstructed_data = pickle.loads(serialized_data)
>>> print(reconstructed_data)
['HTB', 123, 7.77]
Reading the serialized data pickle
outputs is much harder than reading the output PHP provides. However, it is still possible. According to comments in the pickle
library, a pickle
is a program
for a virtual pickle machine
(PM). The PM
contains a stack
and a memo
(long-term memory), and a pickled
object is just a sequence of opcodes
for the PM
to execute, which will recreate an arbitrary object on the stack
.
The PM's stack
is a Last-In-First-Out (LIFO) data structure. You may push
items onto the top
of the stack, and you may pop
the top
object off
of the stack.
Quoting from comments in the pickle
library, the PM's memo
is a "data structure which remembers which objects the pickler has already seen, so that shared or recursive objects are pickled by reference and not by value."
In Lib/pickle.py (Python 3.10), we can see all of the pickle opcodes
defined, and by referring to them, as well as the source code for the various pickling functions, we can piece together what our serialized_data
does exactly when it is passed to pickle.loads()
:
'\x80\x04'
# PROTO 4
# Tell the PM that we are using protocol version 4. This is the default since Python 3.8.
# Protocol versions 3-5 can not be unpickled by Python 2.x.
'\x95\x16\x00\x00\x00\x00\x00\x00\x00'
# FRAME 16
# Essentially we are telling the PM that the serialized data is 16 bytes long.
# The argument is calculated like this:
# `struct.pack("<Q", len(b']\x94(\x8c\x03HTB\x94K{G@\x1f\x14z\xe1G\xae\x14e.')) = b'\x16\x00\x00\x00\x00\x00\x00\x00'`.
']'
# EMPTY_LIST
# Pushes an empty list onto the stack.
# Eventually, we will append the items to this list after we have defined them.
'\x94'
# MEMOIZE
# This stores the object on the top of the stack in the 'memo' which is akin to long-term memory.
# The memo is used to keep transient objects alive during pickling.
# In this case we are 'memozing' the empty list we just pushed onto the stack.
# This opcode is called when pickling any of the following types:
# - __reduce__
# - bytes
# - bytearray
# - string
# - tuple
# - list
# - dict
# - set
# - frozenset
# - global
'('
# MARK
# Pushes the special 'markobject' on the stack.
# This will be referred to later as the starting point for our array items.
'\x8c\x03HTB'
# SHORT_BINUNICODE 3 HTB
# Pushes the unicode string with length 3 'HTB' onto the stack.
'\x94'
# MEMOIZE
# We tell the PM to 'memoize' the string that we just pushed onto the stack.
'K{'
# BININT1 {
# Pushes a 1-byte unsigned int with value 123 onto the stack.
# '{' is the byte representation of 123 calculated as so:
# `chr(123) = b'{'`
'G@\x1f\x14z\xe1G\xae\x14'
# BINFLOAT @\x1f\x14z\xe1G\xae\x14
# Pushes a float with the value 7.77 onto the stack.
# '@\x1f\x14z\xe1G\xae\x14' is the 8-byte float encoding of 7.77 which is calculated like this:
# `struct.pack(">d", 7.77) = b'@\x1f\x14z\xe1G\xae\x14'`
'e'
# APPENDS
# We are telling the PM to extend the empty list on the stack with all items we just defined back up until the 'markobject' we defined earlier.
'.'
# STOP
# This is how we tell the PM we are at the end of the pickle.
# The original array `['HTB', 123, 7.77]` was recreated and now sits at the top of the stack.