- Variable Recovery
- Problem: To extract variable and type information from x86 binary to llvm ir. poster@LLVM Dev 2016 [slides] (docs/proposal.pdf)
- Meeting Minutes
- Build Instructions & Usage Model
- [State-of-the-art Survey] (docs/related-work.md)
- Related Topics
- [Basics: Binary Decompilation] (docs/basics-of-binary-decompilation.md)
IDA does a stack analysis which gives information like
-
Identifying stack variables: For each stack access in the code like
move eax [rsp + OFF], IDA create a variablevar_OFF = qword ptr - OFF h. This stack analysis happens irrespected of the presence of debug information. With debug information, the IDA variables created as above will have names equal to the source variables. -
Identifying which stack variables are involved in a particular instruction: While doing the stack analysis, IDA replaces the offset part of the instructions, accessing stack variable, with the variable introduced above. E.g. the instruction
move eax [rsp + OFF]is replaced bymove eax [rsp + var_OFF]
McSema extracts all these informationsmentioned above using a IDA python plugin and dumps it in a binary protobuff.
But IDA type are restrictive in the following sense:
- It cannot determine if a variable can store an address (i.e. its a pointer) or a integer.
- It cannot determine if a variable is an array.
- Unaware of the object type a particular stack variable is pointing to.
The idea behind this tool is to add type information to the IDA variables using the dwarf debug info.
-
PreWork
- Augment the existing McSema's protobuff with the notion of
typeof Stack and Globals variables. E.g. schema.
- Augment the existing McSema's protobuff with the notion of
-
Input
- Google protobuf binary file produced by McSema (after reading the IDA strcutures) using the new schema ** without the type information**.
- Google protobuf binary file produced by dwarf-type-reader
-
Output
- A new google protobuf binary file with all the contents of (1), but the types of the stack and global variables augmented by the information in (2).