Decompilation of Binaries into LLVM IR for Automated Analysis

Toor, Tejvinder

Decompilation of Binaries into LLVM IR for Automated Analysis

Files

Toor_Tejvinder.pdf (852.52 KB)

Date

2022-01-25

Authors

Toor, Tejvinder

Advisor

Gurfinkel, Arie

Publisher

University of Waterloo

Abstract

Complexity in malicious software is increasing to avoid detection and mitigation. As such, there is greater interest in using automation for reverse engineering. Current state-of-the-art tools use proprietary intermediate representations (IR) in decompilation and lack open-source development. LLVM IR has emerged as a candidate for a reverse engineering IR as it is already a mature tool for compilation and has a wide set of existing analysis tools. In 2019, the NSA released the Ghidra reverse engineering framework as a free and open-source alternative. In this thesis, we examine the development and application of IRs in Ghidra for lifting to LLVM IR and evaluating the efficacy of that lifting. Of interest was lifting at both the disassembly and decompilation stages of Ghidra. We developed two tools: Ghidra-to-LLVM and Ghidrall. The former uses Ghidra's Low P-Code IR for a disassembling lifter while the latter uses Ghidra's decompilation data structures as a decompiling lifter. Lastly, we test the efficacy of Ghidrall as an input for automated solving and against another lifter. Our results show that Ghidra is effective and has promise as an input for future LLVM-based reverse engineering technologies.