HTB Certified Active Directory Pentesting Expert is live! (25% OFF on Gold Annual Plan — for a limited time!) Learn More

Malicious Document Analysis

This module is focussed on understanding different document formats, and techniques for identifying and analyzing the threats posed by malicious documents. By the end of this course, you will be proficient in identifying various types of malicious documents, extracting and analyzing embedded objects, and applying both static and dynamic analysis techniques to uncover malicious behavior.

5.00

Created by MadhukarRaina

Medium Defensive

Summary

The "Malicious Document Analysis" module is structured to provide a thorough understanding of how to analyze and mitigate threats posed by malicious documents. The module is divided into several sections, each focusing on different aspects of document analysis. The module begins with foundational concepts and progresses to advanced techniques, ensuring that learners gain both theoretical knowledge and practical skills.

  • Foundational Concepts: Learn about the different types of malicious documents, their structures, and common exploitation techniques.
  • Practical Skills: Gain hands-on experience with tools and techniques for extracting, analyzing, and mitigating threats.
  • Advanced Techniques: Explore advanced topics such as decrypting and decoding embedded payload and shellcode objects by going through manual and automatic ways.

The module is designed for cybersecurity professionals, incident responders, and anyone interested in enhancing their skills in malicious document analysis.

This module is broken down into sections with accompanying hands-on exercises to practice each of the techniques we cover. The module ends with a practical hands-on skills assessment to gauge your understanding of the various topic areas.

  • Understanding Phishing
  • Malicious Document Formats
  • Document Structure and Metadata Analysis
  • Analysis of VBA and XLM Macro
  • Malicious CHM Analysis
  • Malicious PDF Analysis
  • Malicious RTF document Analysis
  • Identifying and Extracting Embedded Objects
  • Static and Dynamic Analysis of Malicious Documents
  • Shellcode analysis using speakeasy and scdbg
  • XLL Addins Analysis in x64dbg
  • Excel-DNA XLL Add-in Analysis in DnSpy
  • Using CyberChef for analysis

As you work through the module, you will see example commands and command output for the various topics introduced. It is worth reproducing as many of these examples as possible to reinforce further the concepts introduced in each section. You can do this in the Pwnbox provided in the interactive sections or your own virtual machine.

You can start and stop the module at any time and pick up where you left off. There is no time limit or "grading," but you must complete all of the exercises and the skills assessment to receive the maximum number of cubes and have this module marked as complete in any paths you have chosen.

The module is classified as "Medium" as there is a certain requirement of working knowledge of using debugger, analysis tools, and an understanding of Windows command-line.

A firm grasp of the following modules can be considered a prerequisites for successful completion of this module:

  • Introduction to Windows Command Line
  • Introduction to Malware Analysis
  • YARA & Sigma for SOC Analysts
  • Intro to Assembly Language

Introduction

What is a Malicious Document?

Imagine a world where opening a simple Office document, an Excel sheet, a seemingly harmless PDF, or a helpful CHM file could infect our computer. This is not a hypothetical scenario but a reality that countless individuals and organizations face daily. Welcome to the first section of this module, where we explore various techniques related to document-based malware attacks.

A malicious document is a seemingly normal file, such as a Word document, PDF, Excel spreadsheet, or any other type of file, that does not typically execute code by default but has been weaponized with harmful code. When this type of document is opened, the embedded malicious code is executed, potentially leading to various harmful outcomes, such as stealing data, compromising system security, or gaining unauthorized access to a network.

Malicious documents have become a prevalent method for adversaries to compromise systems and steal sensitive information. They are often used to get initial access through phishing attacks that may lead to ransomware campaigns and other malicious activities. Understanding how to analyze these documents is crucial for cybersecurity professionals to detect and respond to threats effectively. This knowledge helps prevent data breaches, protect sensitive information, and maintain the integrity of organizational systems.

How a Document Executes Code?

When a malicious document is opened, it typically leverages various methods to run its embedded malicious code. The process usually begins when a user opens a malicious document, often delivered via email or downloaded from a compromised website.

The GIF demonstrates an example of the steps commonly found in a Malicious Word Document.

alt text

STEP 1 - Initial Document Opening

  • User Interaction: The first step is usually the user interaction. The user opens the malicious document, often delivered via email or downloaded from a compromised website. Common malicious document types include Microsoft Office files that can contain macros (e.g., .docm, .xlsm and .pptm), PDF files, or other format such as RTF (Rich Text Format) that support embedded scripts.

STEP 2 - Exploitation of Embedded Code

  • Macros/VBA Scripts (Office Documents): In Microsoft Office documents, malicious macros (written in Visual Basic for Applications, VBA) or embedded scripts can be automatically executed if macros are enabled. Attackers often employ social engineering techniques to trick users into enabling macros, such as by saying, "Please enable macros to view the content correctly".
  • Embedded Objects: The document may contain embedded objects, such as OLE (Object Linking and Embedding) objects, that can execute code when interacted with.
  • JavaScript (PDF Files): In PDF documents, JavaScript can be embedded and automatically executed when the document is opened, leading to the execution of malicious code.

STEP 3 - Shellcode or Exploit Execution

  • Shellcode Injection: The embedded script may inject shellcode directly into the current process's memory or another process's memory, effectively bypassing some security mechanisms.
  • Exploitation of Vulnerabilities: The document may exploit a known vulnerability in the application used to open it (e.g., a buffer overflow in Adobe Reader) to gain control over the execution flow and run arbitrary code.

STEP 4 - Dropping and Executing Payload

  • Payload Download: The script may download additional malware from a remote server, often using HTTP, HTTPS, or DNS communication.
  • Payload Execution: The document may drop an executable file on the disk or load the payload directly into memory. This payload could be a backdoor, ransomware, keylogger, or another type of malware.
  • Process Injection: The malicious document may inject its payload into a legitimate process to evade detection and run with the privileges of that process (e.g., explorer.exe).

STEP 5 - Establishing Persistence

  • Persistence Mechanisms: The malware may establish persistence on the victim’s machine by modifying the registry, creating scheduled tasks, or placing files in startup directories.
  • Command and Control (C2) Communication: The malware often communicates with a remote C2 server to receive further instructions, exfiltrate data, or download additional components.

STEP 6 - Execution and Lateral Movement

  • Execution of Malicious Activities: Once the payload is executed, it carries out its intended malicious activities, such as data exfiltration, file encryption, or spying on the user.
  • Lateral Movement: If the malware aims to move laterally within a network, it may leverage credentials obtained from the infected system to access other machines.

These documents are typically observed in the initial stage of a malware attack as part of a spearphishing attachment. The screenshot below from the MITRE ATT&CK framework shows technique T1566.001, which is related to spearphishing attachments.

Maldoc

MITRE has another technique, T1204.002, under user execution related to malicious files. Here, an adversary relies on a user opening the malicious file. Adversaries may employ various forms of masquerading and obfuscated files or information to increase the likelihood that a user will open and successfully execute a malicious file.

Maldoc

We'll learn how malware can be embedded within everyday files, and explore how these files, that we might encounter in our daily activities, can be weaponized to infect systems or steal data.

Learning Objectives

  • Identify and Understand File-Based Malware: Learn how Office documents, PDFs, and CHM files can be used as vehicles to deliver malware.
  • Recognize the Signs of Infection: Learn ways to detect when a file might be compromised.
  • Explore Analysis Steps: Discover techniques and tools to analyze these malicious documents.

File Classification & Tools

We'll begin by classifying some of the document-based file types that can be used by adversaries as attachments to gain initial access. Understanding these classifications is important for analyzing these files effectively:

  • Office Documents: These include documents such as MS Word and Excel files, frequently used to deliver malicious macros and scripts. We will explore how macros, embedded objects, and malicious links can turn an innocent-looking spreadsheet or Word document into a weapon.
  • RTF Files: Understand how RTF files are often used in phishing attacks.
  • PDF Files: Portable Document Format (PDF) files can contain embedded JavaScript and other types of exploits.
  • CHM Files: CHM files are often used for help documentation. These less common formats can also be manipulated to deliver malware.

The diagram shown below provides an overview of the different documents and tools used to examine them:

Maldoc

Note: There are many tools that can help with malicious document analysis. This is not a comprehensive list by any means; it's just a list of tools used throughout this module.

Tools & Setup

Before diving into document analysis, it's essential to set up a secure and efficient environment. The recommended tools, including the setup instructions, are as follows:

  • Virtual Machines (VMs): Use VMs to create isolated environments for safe analysis. Tools like VirtualBox and VMware are commonly used.
  • Sandboxing Tools: Tools such as Cuckoo Sandbox provide automated environments to safely execute and analyze malicious documents.
  • Static Analysis Tools: Tools like ExifTool, Oletools, Didier Stevens Suite and peepdf are essential for examining document metadata and structure without execution.
  • Dynamic Analysis Tools: Tools like Fiddler/Wireshark, Process Monitor, x64dbg and various sandboxing solutions help monitor document behavior in real-time.
  • Reverse Engineering Tools: Tools like ViperMonkey, CyberChef, speakeasy and dnSpy are useful for deobfuscating and understanding malicious macros, scripts, shellcode objects and plugins (such as XLL and WLL).

The required tools are installed within the target (VM) associated with this module. We'll be able to use the tools in the next sections.

For more details on different tools, watch the Analyzing Phishing Documents 101 video (by @0xdf) on HackTheBox's youtube channel. This video contains an overview of analyzing malicious documents, and some CTF challenges related to it.

Best Practices

When analyzing malicious documents, it's important to follow best practices to ensure a thorough and secure analysis. Some of the best practices to consider are mentioned as follows:

  • Use a Safe and Isolated Environment: Always analyze malicious documents (maldocs) in a virtual machine (VM) or sandbox environment that is completely isolated from the main network to prevent any potential spread of malware. Take a snapshot of the VM before starting the analysis, allowing to quickly revert to a clean state if needed.
  • Try to use Multiple Analysis Tools: Using a variety of tools is much more effective than relying on just one, as it provides richer details and valuable metadata that a single tool may miss.
  • Analyze Document Metadata: Analyze the document's metadata for clues about its origin, such as the creation date, author, or software used to create it. Tools like ExifTool can be useful for this purpose.
  • Inspect Macros and Embedded Scripts: Malicious macros and scripts are often obfuscated. Use tools like olevba or CyberChef to deobfuscate them and understand the script's logic.
  • Document Findings and Indicators of Compromise (IOCs): Keep detailed notes on all observed behaviors, file modifications, network activity, and other indicators.
  • Perform Comparative Analysis: If possible, compare the maldoc with known malicious samples to identify similarities or new tactics, techniques, and procedures (TTPs).

In addition to these best practices, it is important for us to stay informed and actively share our knowledge about phishing and harmful documents with our colleagues.

  • Educate and Train Users: Conduct regular training sessions to educate employees and users about the risks associated with malicious documents and how to spot phishing attempts.

In a blog post from Anomali, the analyst mentions Education is the best defense. Education and awareness are essential in recognizing and preventing spearphishing attacks, empowering individuals to identify suspicious attachments and not open them directly.

  • Avoid Opening Unknown Attachments: If we receive an unexpected or suspicious attachment, it is recommended to not open it until we've confirmed its authenticity with the sender.

One of the best headings in a blog post from RedCanary which says "Don't be an enabler". Whenever we see an unknown suspicious email asking us to enable macros, we shall remember this heading. Instead of opening the documents directly, learn how to analyze these samples safely.

Always be cautious when opening documents from unknown sources. Stay protected with cybersecurity best practices.

By the end of this module, we'll not only be aware of the risks posed by these common file types but also be inspired to dive deeper into the investigation process. We'll gain the skills to protect ourselves and our organization from these threats and develop a critical mindset that questions the safety of every file we encounter.


Click on Mark Complete & Next to proceed to the next section.

Sign Up / Log In to Unlock the Module

Please Sign Up or Log In to unlock the module and access the rest of the sections.