Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Password Extraction: Guided Example

In this exercise you will be given an x86-64 binary which results from the compilation of a C program. This binary prompts for a password to authenticate the proper user. The correct password has been hardcoded into the binary, and you do not have access to the source code. Hardcoding passwords is not a secure coding practice, and the goal of the exercise is to extract the password from this binary using reverse-engineering to demonstrate that.

This first example will guide you through decompiling the program, exploring its machine code, running it step by step in a debugger, and extracting the password.

Part of this exercise has been inspired by Georgia Tech's CS6265 Information Security Lab.

Downloading and Running the Binary

Download this binary. You will likely need to give it executable permissions:

chmod +x crackme01

Execute the binary and input something when prompted for the password:

Please input your password: 
test
Wrong password.

⚠️ The binaries to reverse engineer in this exercise are different for each student. Make sure to download them using the link provided above and do not work on the binaries downloaded by another student.

As we don't have access to the source code, our goal here will be to inspect the binary's machine code and its behaviour at runtime to extract to the password.

Inspecting the Binary's Sections with objdump

objdump is a command-line tool used for executable and object files, among others. It is part of the GNU Binutils package and is commonly used in reverse engineering, debugging, and binary analysis. We can use it to obtain some general information about the binary:

objdump -f crackme01           

crackme01:     file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000001070

The tool reports that the binary is an ELF executable, compiled for the Intel x86-64 architecture. It also shows that the code supposed to be executed first after the program is loaded, i.e. the program entry point, is at address 0x1070.

We can list the ELF sections present in the binary as follows:

objdump -h crackme01

crackme01:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       0000001c  0000000000000318  0000000000000318  00000318  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.gnu.property 00000020  0000000000000338  0000000000000338  00000338  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
# ...
 11 .init         00000017  0000000000001000  0000000000001000  00001000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .plt          00000040  0000000000001020  0000000000001020  00001020  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .plt.got      00000008  0000000000001060  0000000000001060  00001060  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 14 .text         00000159  0000000000001070  0000000000001070  00001070  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
# ...

Each section as an id (Idx) and a name (Name). As a convention the names of section start with a .. The VMA field indicates where this section will be mapped in memory by the loader when the program executes. The mapping will be done from within the binary itself at location File off, for a size equal to Size. That loading process is illustrated below (from a single section point of view):

Sections have attributes, written in capital letters in objdump's output. The interesting ones here are:

  • DATA: indicates that this section contains data that will be mapped with read-write or read-only permissions in the address space at load time.
  • CODE: indicates that this section contains executable code that will be mapped with executable and read-only permissions in the address space at load time.

Remember that we want to inspect the machine code of this program to try to reverse-engineer its behaviour and find the password, so we'll look at the CODE sections. They are as follows:

  • .init and .fini are the initialisation and finalisation section, they contain code that runs before the main program starts (.init) and right before it exits (.fini). They are used for implementing things like C++ global constructors and destructors.
  • .plt.got and .plt are the global offset table and the procedure linkage table, two mechanisms helping for dynamic linking. If you want to learn more about these (not needed for this exercise) read this.
  • .text: the main code section, that contain the program's code (e.g. the main function). This is what we are interested in here.

Disassembling the Binary's code with objdump

We can disassemble the main code of the program, i.e. the .text section, to see the machine code it contains as follows:

objdump --disassemble -j .text crackme01

objdump will proceed to print the disassembled machine code of .text.

This exercise requires a very basic understanding of x86-64 assembly. Please refer to this reference sheet for a short list of the main x86-64 instructions as well as general and special purpose registers.

With this binary objdump is able to identify the machine code corresponding to each function present in the source code that was compiled to produce this binary. We see the machine code for a bunch of functions automatically added by the C library (_start, register_tm_clones, deregister_tm_clones, __do_global_dtors_aux, and frame_dummy), as well as for the main function, which is what we are interested in here:

0000000000001159 <main>:
    1159:	55                   	push   %rbp
    115a:	48 89 e5             	mov    %rsp,%rbp
    115d:	48 83 c4 80          	add    $0xffffffffffffff80,%rsp
    1161:	48 8d 05 9c 0e 00 00 	lea    0xe9c(%rip),%rax        # 2004 <_IO_stdin_used+0x4>
    1168:	48 89 c7             	mov    %rax,%rdi
    116b:	e8 c0 fe ff ff       	call   1030 <puts@plt>
    1170:	48 8b 15 c9 2e 00 00 	mov    0x2ec9(%rip),%rdx        # 4040 <stdin@GLIBC_2.2.5>
    1177:	48 8d 45 80          	lea    -0x80(%rbp),%rax
    117b:	be 80 00 00 00       	mov    $0x80,%esi
    1180:	48 89 c7             	mov    %rax,%rdi
    1183:	e8 b8 fe ff ff       	call   1040 <fgets@plt>
    1188:	48 8d 45 80          	lea    -0x80(%rbp),%rax
    118c:	48 8d 15 95 2e 00 00 	lea    0x2e95(%rip),%rdx        # 4028 <__dso_handle+0x8>
    1193:	48 89 d6             	mov    %rdx,%rsi
    1196:	48 89 c7             	mov    %rax,%rdi
    1199:	e8 b2 fe ff ff       	call   1050 <strcmp@plt>
    119e:	85 c0                	test   %eax,%eax
    11a0:	75 11                	jne    11b3 <main+0x5a>
    11a2:	48 8d 05 78 0e 00 00 	lea    0xe78(%rip),%rax        # 2021 <_IO_stdin_used+0x21>
    11a9:	48 89 c7             	mov    %rax,%rdi
    11ac:	e8 7f fe ff ff       	call   1030 <puts@plt>
    11b1:	eb 0f                	jmp    11c2 <main+0x69>
    11b3:	48 8d 05 82 0e 00 00 	lea    0xe82(%rip),%rax        # 203c <_IO_stdin_used+0x3c>
    11ba:	48 89 c7             	mov    %rax,%rdi
    11bd:	e8 6e fe ff ff       	call   1030 <puts@plt>
    11c2:	b8 00 00 00 00       	mov    $0x0,%eax
    11c7:	c9                   	leave
    11c8:	c3                   	ret

Pay attention to the call instructions, which implement the function calls present in the original source code. These are calls to functions from the C standard library. We have, in order:

  1. A call to puts, used by the C library to print something on the command line. This is likely the result of a printf in the original source code, used to print Please input your password: when the program is executed.
  2. A call to fgets, used to get a string from the console: that is likely the way the program gets the password attempt from the user.
  3. A call to strcmp, which performs string comparison: that is probably how the password attempt is compared to the correct password.

strcmp takes two parameters, that are pointers to the strings to compare. We know that one of these strings is the password attempt, and the other is the correct password. The next step is to run the program step by step in a debugger and observe the values pointed by these pointers at runtime to find the password.

Debugging the Program with GDB

Here we will use GDB Enhanced Features, a series of enhancements to GDB that significantly ease the runtime exploration of a program's behaviour. To install it, type the following in a terminal:

bash -c "$(curl -fsSL https://gef.blah.cat/sh)"

Once GEF is installed, load the program with GDB:

gdb crackme01

A reference sheet with the main GDB commands is available here.

Within GDB, set a breakpoint on the function main:

gef> br main
Breakpoint 1 at 0x115d

After the breakpoint is set, run the program until execution hits the breakpoint:

run

Once the breakpoint is hit, GDB/GEF will display a bunch of information about the program's state of execution. Locate the following:

  • Registers values: $rax, $rbx, etc.
  • Stack content, with the top of the stack pointed by the stack pointer $rbp.
  • Disassembled machine code, with the next instruction to execute pointed by the green arrow (its address is pointed by the instruction pointer $rip).

At that stage we need to let the program continue its execution step by step, i.e. machine instruction by machine instruction. This is done with the ni command. Use it repeatedly until the program prompt you for the password, and enter something easily recognisable, e.g. xxxxx.

Next continue the execution step by step until you hit the call to strcmp. GDB/GEF should detect that this is a function call. It is aware of the calling convention (first parameter in $rsi, second parameter in $rdi, etc.) and gives you a list of value for the parameters:

strcmp@plt (
   $rdi = 0x00007fffffffd990 → 0x00000a7878787878 ("xxxxx\n"?),
   $rsi = 0x0000555555558028 → "password-here",
   ...
)

The first parameter in $rdi is the password attempt we just entered. The second one in $rsi is what we are looking for, it's the correct value for the password! Note that it will have a different value for your version of the binary. Try it out outside GDB by running the program normally:

Please input your password: 
password-here
Authentication successful!

Submission Instructions

Input the password extracted on the corresponding line of the CSV file in the submission git repository, i.e.:

crackme01,password-here