Password Extraction: Guided Example
In this exercise you will be given an x86-64 binary which results from the compilation of a C program. This binary prompts for a password to authenticate the proper user. The correct password has been hardcoded into the binary, and you do not have access to the source code. Hardcoding passwords is not a secure coding practice, and the goal of the exercise is to extract the password from this binary using reverse-engineering to demonstrate that.
This first example will guide you through decompiling the program, exploring its machine code, running it step by step in a debugger, and extracting the password.
Part of this exercise has been inspired by Georgia Tech's CS6265 Information Security Lab.
Downloading and Running the Binary
Download this binary. You will likely need to give it executable permissions:
chmod +x crackme01
Execute the binary and input something when prompted for the password:
Please input your password:
test
Wrong password.
⚠️ The binaries to reverse engineer in this exercise are different for each student. Make sure to download them using the link provided above and do not work on the binaries downloaded by another student.
As we don't have access to the source code, our goal here will be to inspect the binary's machine code and its behaviour at runtime to extract to the password.
Inspecting the Binary's Sections with objdump
objdump is a command-line tool used for executable and object files, among others.
It is part of the GNU Binutils package and is commonly used in reverse engineering, debugging, and binary analysis.
We can use it to obtain some general information about the binary:
objdump -f crackme01
crackme01: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000001070
The tool reports that the binary is an ELF executable, compiled for the Intel x86-64 architecture.
It also shows that the code supposed to be executed first after the program is loaded, i.e. the program entry point, is at address 0x1070.
We can list the ELF sections present in the binary as follows:
objdump -h crackme01
crackme01: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 0000000000000318 0000000000000318 00000318 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.gnu.property 00000020 0000000000000338 0000000000000338 00000338 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
# ...
11 .init 00000017 0000000000001000 0000000000001000 00001000 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .plt 00000040 0000000000001020 0000000000001020 00001020 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .plt.got 00000008 0000000000001060 0000000000001060 00001060 2**3
CONTENTS, ALLOC, LOAD, READONLY, CODE
14 .text 00000159 0000000000001070 0000000000001070 00001070 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
# ...
Each section as an id (Idx) and a name (Name).
As a convention the names of section start with a ..
The VMA field indicates where this section will be mapped in memory by the loader when the program executes.
The mapping will be done from within the binary itself at location File off, for a size equal to Size.
That loading process is illustrated below (from a single section point of view):
Sections have attributes, written in capital letters in objdump's output.
The interesting ones here are:
DATA: indicates that this section contains data that will be mapped with read-write or read-only permissions in the address space at load time.CODE: indicates that this section contains executable code that will be mapped with executable and read-only permissions in the address space at load time.
Remember that we want to inspect the machine code of this program to try to reverse-engineer its behaviour and find the password, so we'll look at the CODE sections.
They are as follows:
.initand.finiare the initialisation and finalisation section, they contain code that runs before the main program starts (.init) and right before it exits (.fini). They are used for implementing things like C++ global constructors and destructors..plt.gotand.pltare the global offset table and the procedure linkage table, two mechanisms helping for dynamic linking. If you want to learn more about these (not needed for this exercise) read this..text: the main code section, that contain the program's code (e.g. themainfunction). This is what we are interested in here.
Disassembling the Binary's code with objdump
We can disassemble the main code of the program, i.e. the .text section, to see the machine code it contains as follows:
objdump --disassemble -j .text crackme01
objdump will proceed to print the disassembled machine code of .text.
This exercise requires a very basic understanding of x86-64 assembly. Please refer to this reference sheet for a short list of the main x86-64 instructions as well as general and special purpose registers.
With this binary objdump is able to identify the machine code corresponding to each function present in the source code that was compiled to produce this binary.
We see the machine code for a bunch of functions automatically added by the C library (_start, register_tm_clones, deregister_tm_clones, __do_global_dtors_aux, and frame_dummy), as well as for the main function, which is what we are interested in here:
0000000000001159 <main>:
1159: 55 push %rbp
115a: 48 89 e5 mov %rsp,%rbp
115d: 48 83 c4 80 add $0xffffffffffffff80,%rsp
1161: 48 8d 05 9c 0e 00 00 lea 0xe9c(%rip),%rax # 2004 <_IO_stdin_used+0x4>
1168: 48 89 c7 mov %rax,%rdi
116b: e8 c0 fe ff ff call 1030 <puts@plt>
1170: 48 8b 15 c9 2e 00 00 mov 0x2ec9(%rip),%rdx # 4040 <stdin@GLIBC_2.2.5>
1177: 48 8d 45 80 lea -0x80(%rbp),%rax
117b: be 80 00 00 00 mov $0x80,%esi
1180: 48 89 c7 mov %rax,%rdi
1183: e8 b8 fe ff ff call 1040 <fgets@plt>
1188: 48 8d 45 80 lea -0x80(%rbp),%rax
118c: 48 8d 15 95 2e 00 00 lea 0x2e95(%rip),%rdx # 4028 <__dso_handle+0x8>
1193: 48 89 d6 mov %rdx,%rsi
1196: 48 89 c7 mov %rax,%rdi
1199: e8 b2 fe ff ff call 1050 <strcmp@plt>
119e: 85 c0 test %eax,%eax
11a0: 75 11 jne 11b3 <main+0x5a>
11a2: 48 8d 05 78 0e 00 00 lea 0xe78(%rip),%rax # 2021 <_IO_stdin_used+0x21>
11a9: 48 89 c7 mov %rax,%rdi
11ac: e8 7f fe ff ff call 1030 <puts@plt>
11b1: eb 0f jmp 11c2 <main+0x69>
11b3: 48 8d 05 82 0e 00 00 lea 0xe82(%rip),%rax # 203c <_IO_stdin_used+0x3c>
11ba: 48 89 c7 mov %rax,%rdi
11bd: e8 6e fe ff ff call 1030 <puts@plt>
11c2: b8 00 00 00 00 mov $0x0,%eax
11c7: c9 leave
11c8: c3 ret
Pay attention to the call instructions, which implement the function calls present in the original source code.
These are calls to functions from the C standard library.
We have, in order:
- A call to
puts, used by the C library to print something on the command line. This is likely the result of aprintfin the original source code, used to printPlease input your password:when the program is executed. - A call to
fgets, used to get a string from the console: that is likely the way the program gets the password attempt from the user. - A call to
strcmp, which performs string comparison: that is probably how the password attempt is compared to the correct password.
strcmp takes two parameters, that are pointers to the strings to compare.
We know that one of these strings is the password attempt, and the other is the correct password.
The next step is to run the program step by step in a debugger and observe the values pointed by these pointers at runtime to find the password.
Debugging the Program with GDB
Here we will use GDB Enhanced Features, a series of enhancements to GDB that significantly ease the runtime exploration of a program's behaviour. To install it, type the following in a terminal:
bash -c "$(curl -fsSL https://gef.blah.cat/sh)"
Once GEF is installed, load the program with GDB:
gdb crackme01
A reference sheet with the main GDB commands is available here.
Within GDB, set a breakpoint on the function main:
gef> br main
Breakpoint 1 at 0x115d
After the breakpoint is set, run the program until execution hits the breakpoint:
run
Once the breakpoint is hit, GDB/GEF will display a bunch of information about the program's state of execution. Locate the following:
- Registers values:
$rax,$rbx, etc. - Stack content, with the top of the stack pointed by the stack pointer
$rbp. - Disassembled machine code, with the next instruction to execute pointed by the green arrow (its address is pointed by the instruction pointer
$rip).
At that stage we need to let the program continue its execution step by step, i.e. machine instruction by machine instruction.
This is done with the ni command.
Use it repeatedly until the program prompt you for the password, and enter something easily recognisable, e.g. xxxxx.
Next continue the execution step by step until you hit the call to strcmp.
GDB/GEF should detect that this is a function call.
It is aware of the calling convention (first parameter in $rsi, second parameter in $rdi, etc.) and gives you a list of value for the parameters:
strcmp@plt (
$rdi = 0x00007fffffffd990 → 0x00000a7878787878 ("xxxxx\n"?),
$rsi = 0x0000555555558028 → "password-here",
...
)
The first parameter in $rdi is the password attempt we just entered.
The second one in $rsi is what we are looking for, it's the correct value for the password!
Note that it will have a different value for your version of the binary.
Try it out outside GDB by running the program normally:
Please input your password:
password-here
Authentication successful!
Submission Instructions
Input the password extracted on the corresponding line of the CSV file in the submission git repository, i.e.:
crackme01,password-here