Probabilistic Naming of Functions in Stripped Binaries

James Patrick-Evans, Lorenzo Cavallaro, Johannes Kinder

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

8 Citations (Scopus)
152 Downloads (Pure)

Abstract

Debugging symbols in binary executables carry the names of functions and global variables. When present, they greatly simplify the process of reverse engineering, but they are almost always removed (stripped) for deployment. We present the design and implementation of punstrip, a tool which combines a probabilistic fingerprint of binary code based on high-level features with a probabilistic graphical model to learn the relationship between function names and program structure. As there are many naming conventions and developer styles, functions from different applications do not necessarily have the exact same name, even if they implement the exact same functionality. We therefore evaluate punstrip across three levels of name matching: exact; an approach based on natural language processing of name components; and using Symbol2Vec, a new embedding of function names based on random walks of function call graphs. We show that our approach is able to recognize functions compiled across different compilers and optimization levels and then demonstrate that punstrip can predict semantically similar function names based on code structure. We evaluate our approach over open source C binaries from the Debian Linux distribution and compare against the state of the art.

Original languageEnglish
Title of host publicationProceedings - 36th Annual Computer Security Applications Conference, ACSAC 2020
Pages373-385
Number of pages13
ISBN (Electronic)9781450388580
DOIs
Publication statusPublished - 7 Dec 2020

Publication series

NameACM International Conference Proceeding Series

Fingerprint

Dive into the research topics of 'Probabilistic Naming of Functions in Stripped Binaries'. Together they form a unique fingerprint.

Cite this