class: center, middle ### COMP26020 Programming Languages and Paradigms -- Part 1 *** # Memory Safety ??? - Hello everyone - In this video I will talk about memory safety, or rather memory unsafety --- background-image: url(include/segfault.jpg) --- # Memory Unsafety in C Programs - C, and other languages e.g. C++ **are inherently memory unsafe** - There is absolutely no check at runtime if memory accessed is initialised, allocated, or mapped - In most cases the compiler won't warn you either -- - **A few examples of memory errors:** - Out of bounds accesses on buffers/arrays accesses -- - Failure to initialise stack/heap-allocated variables before read access -- - Leaks, double free, use-after-free -- - Memory errors are **hard to detect, hard to debug** - No warning/error at compile-time - Undefined behaviour: at runtime sometimes it crashes, sometimes not... -- > **These bugs have huge security implications!** ??? - Indeed it's one of the main characteristics of C and C++, they are by design not memory safe - For several reasons, including performance goals, the C/C++ programmer has total control over the memory and that means he can mess up the memory as much as he wants - There is no bound check on buffers and array accesses - Uninitialised variables generally contain random garbage - Dynamic memory allocation can lead to a lot of mistakes that won't be caught by the compiler - Such as memory leaks, double free, use-after-free, and so on - All these issues are hard to detect and to debug - By now I suspect you get that they are the worse because the program compiles but crashes at runtime with not much information - Sometimes it evens seems to execute fine for a few executions, or on a particular platform - And when you relaunch the program or change the platform a crash occurs - But in addition to crashes, that in the end are not so bad, these memory issues can have dramatic implication in terms of security and that's concerning - In this video, I'll show you 4 example of memory bugs that lead to security issues ---
.small[Sources: https://zd.net/3q8axgo, https://zd.net/3wfFHU7] --- class: inverse, middle, center # Example 1: Infoleak ??? - The first example regards the leak of confidential information --- name: infoleak # Example 1: Infoleak Original code: ```c char *welcome_message = "Hi there! How is it going?\n"; // 27 char char *password = "secret"; char entered_password[128]; int main(int argc, char **argv) { for(int i=0; i<27; i++) // Print welcome message character by character printf("%c", welcome_message[i]); printf("Please input the password:\n"); scanf("%s", entered_password); if(!strcmp(entered_password, password)) { printf("Passowrd ok!\n"); /* ... */ } else { printf("Wrong password! aborting\n"); } return 0; } ``` .codelink[
`23-memory-safety/infoleak-orig.c`
] --- # Example 1: Infoleak Updated code (`welcome_message` shortened): ```c *char *welcome_message = "Hi there!\n"; // shortened welcome message, only 11 chars now char *password = "secret"; char entered_password[128]; int main(int argc, char **argv) { for(int i=0; i<27; i++) printf("%c", welcome_message[i]); printf("Please input the password:\n"); scanf("%s", entered_password); if(!strcmp(entered_password, password)) { printf("Passowrd ok!\n"); /* ... */ } else { printf("Wrong password! aborting\n"); } return 0; } ``` .codelink[
`23-memory-safety/infoleak-updated.c`
] --- name: infoleak # Example 1: Infoleak Updated code (`welcome_message` shortened): ```c *char *welcome_message = "Hi there!\n"; // shortened welcome message, only 11 chars now char *password = "secret"; char entered_password[128]; int main(int argc, char **argv) { * for(int i=0; i<27; i++) // Oopsie! forgot to update that bit of the code printf("%c", welcome_message[i]); printf("Please input the password:\n"); scanf("%s", entered_password); if(!strcmp(entered_password, password)) { printf("Passowrd ok!\n"); /* ... */ } else { printf("Wrong password! aborting\n"); } return 0; } ``` .codelink[
`23-memory-safety/infoleak-updated.c`
] ??? - Let's assume a program distributed in binary form, the use does not have access to the sources - The program starts by printing a welcome message character by character - Then asks for a password from the user and compares the input to the real password - If the password is correct, the program then goes on to do something important - Let's assume that after changing the welcome message, the programmer forget to update the loop - The message being now much smaller, the printing loop will overflow the message string and print whatever bytes are present next in memory - It may be garbage ... or not --- template: infoleak
??? - Now the thing is, due to how the compiler lays out the variables in memory, there are actually good chances that we get something like that - The two strings are close by in memory and password is located after the welcome message --- template: infoleak
??? - So the printing loop overflows the welcome message and reads bytes from password --- template: infoleak
??? - And password is leaked to the standard output! - Let's check it out - Remember that the user has access only to the binary not the source code --- class: inverse, middle, center # Example 2: Sensitive Data Tampering ??? - Let's see a second example in which we tamper with a password to bypass it --- name: tampering # Example 2: Sensitive Data Tampering ```c char user_input[32] = "0000000000"; char password[32] = "secret"; int main(int argc, char **argv) { if(argc != 2) { printf("Usage: %s
\n", argv[0]); return 0; } strcpy(user_input, argv[1]); if(!strncmp(password, user_input, strlen(password))) { printf("login success!\n"); /* do important stuff ... */ } else { printf("wrong password!\n"); } return 0; } ``` .codelink[
`23-memory-safety/tampering.c`
] ??? - Let's have a look at this program - This one takes the password as parameter - Copies it in a static buffer user_input - Compares that buffer with the real password and if they match goes on to do some privileged stuff - Now the issue is that there are no checks done by strcpy regarding the sizes of the source and destination strings! - Let's assume once again that the user has access only to the binary and not the source code --- template: tampering
??? - So we have the memory layout looking like that - An what happens when we pass a buffer that is too long as command line parameter --- template: tampering
??? - is that strcpy is going to overflow the user_input buffer and start writing in the password! - so if we enter a command line parameter that is long enough, and we have it contain characters that will lead to user_input having the same content as password --- template: tampering
??? - So when strcmp is called --- template: tampering
??? - The strings will match and we can login without the password --- class: inverse, center, middle # Example 3: Stack Smashing ??? - Now let's talk a bit about an attack named stack smashing --- # Example 3: Stack Smashing - Classic control flow hijacking attack - Originally proposed in 1996 here: http://www.phrack.org/archives/issues/49/14.txt - First let's see how the CPU manage function calls/returns ??? - It's a classic attack - Before showing an example, we first have to understand how the machine uses the stack to manage function call and return operations --- # Example 3: Stack Smashing
--- # Example 3: Stack Smashing
??? - So we have the stack in memory on the left - A simple C program on the right, it's just a function f calling another function g - Let's assume we are in f right before it calls g - There is a bunch of things on the stack, including f's local variables --- # Example 3: Stack Smashing
??? - Now at the time f calls g, the CPU actually executes the following x86-64 instruction callq - This instruction does two things - First, something that is called the return address is pushed on the stack - It's the address of the instruction that the CPU will jump on when we return from G - Notice that the stack grows down - So it corresponds to the line of code in f right after the call to G - And then the CPU jumps to g --- # Example 3: Stack Smashing
??? - G now executes, its local variables are stored on the stack --- # Example 3: Stack Smashing
??? - And then when G executes the return C statement, what happens on the CPU is that the ret instruction is executed - What this instruction does is that the CPU pops the return address from the stack and jumps to it - Effectively we resume execution in f right after the call to g - So that's in a nutshell how function calls are managed: the return address is stored on the stack - Of course there is more to function call regarding parameters and return value but we don't need to talk about that for explaining this particular exploit --- # Example 3: Stack Smashing ```c char *password = "super-secret-password"; void security_critical_function() { printf("launching nukes!!\n"); } void preprocess_input(char *string) { char local_buffer[16]; strcpy(local_buffer, string); /* work on local buffer ... */ return; } int main(int argc, char **argv) { if (argc != 2) { printf("usage: %s
\n", argv[0]); return -1; } preprocess_input(argv[1]); if(!strncmp(password, argv[1], strlen(password))) security_critical_function(); else printf("Unauthorised user!\n"); return 0; } ``` .codelink[
`23-memory-safety/stack-smashing.c`
] ??? - Now let's have a look at this program - This function takes a password as command line parameter - The input is passed through a function that preprocess it after having it copied in a local buffer with strcpy - And if the password is correct we execute a security critical function - So what is the problem with this program - It is at the level of the strcpy, that does not check for the size of the source buffer, and that allows to overflow the designation buffer that is on the stack because it's a local variable --- name: smashing # Example 3: Stack Smashing .leftcol[ ```c char *password = "super-secret-password"; void security_critical_function() { printf("launching nukes!!\n"); } void preprocess_input(char *string) { char local_buffer[16]; strcpy(local_buffer, string); /* work on local buffer ... */ } int main(int argc, char **argv) { if (argc != 2) { /* ... */ } * preprocess_input(argv[1]); if(!strncmp(password, argv[1], strlen(password))) security_critical_function(); else printf("Unauthorised user!\n"); return 0; } ``` .codelink[
`23-memory-safety/stack-smashing.c`
] ] ??? - Let's see what we can do with this overflow - The program is on the left, I just compressed it a bit to have it fit on the first half of a slide --- template: smashing .rightcol[
] ??? - At the time we execute the function preprocess_input, the stack looks like that - We have the calling context local variables first - Then the return address that was pushed when we called preprocess_input - And then preprocess_input local variables --- template: smashing .rightcol[
] ??? - local_buffer is in there --- template: smashing .rightcol[
] ??? - Remember that because the stack grows down, lower addresses are on the bottom of this graph, so when we overflow local_buffer we are effectively writing up the top of the stack and that includes the return address - By using a carefully crafted input string, we can overwrite the return address with the address of a function we would like to execute --- template: smashing .rightcol[
] ??? - For example the address of security_critical_function! --- template: smashing .rightcol[
] ??? - And when the execution returns from preprocess_input, the CPU will jump to security_critical_function rather than returning to main --- class: inverse, center, middle # Example 4: Use-After-Free ??? - A last example is the use-after-free - It happens when the programmer mistakenly uses an object that has been freed --- name: uaf # Example 4: Use-After-Free .leftcol[ ```c typedef struct { double member1; double member2; void (*member3)(int); } my_struct; void print_hello(int x) { printf("Hello, parameter: %d\n", x); } void security_critical_function() { printf("Launching nukes!\n"); /* ... */ } // ``` ] .rightcol[ ```c int main(int argc, char **argv) { /* allocate and init ms */ my_struct *ms = malloc(sizeof(my_struct)); ms->member1 = 42.0; ms->member2 = 42.0; ms->member3 = &print_hello; /* call the function pointer */ ms->member3(12); free(ms); char *buffer = malloc(12); strcpy(buffer, argv[1]); ms->member3(12); /* check a password, runs sec_crit_fn */ } ``` .codelink[
`23-memory-safety/stack-smashing.c`
] ] ??? - Such as in this example, where a data structure is allocated with malloc, then freed, then mistakenly used here - This issue is not caught by the compiler, and in many scenarios will not manifest either at runtime - if we look at the data structure declaration - It has an int member and a second member that is a function pointer - It's a variable that can store the address of a function, like it is done in the main function - We put the address of the print_hello function in the member - And this function can be called through the variable, as it also done in the main function - Between the free operation and the use after free, a buffer is allocated with malloc and input from the use is copied there --- # Example 4: Use-After-Free .leftcol[ ```c typedef struct { double member1; double member2; * void (*member3)(int); } my_struct; void print_hello(int x) { printf("Hello, parameter: %d\n", x); } void security_critical_function() { printf("Launching nukes!\n"); /* ... */ } // ``` ] .rightcol[ ```c int main(int argc, char **argv) { /* allocate and init ms */ my_struct *ms = malloc(sizeof(my_struct)); ms->member1 = 42.0; ms->member2 = 42.0; * ms->member3 = &print_hello; /* call the function pointer */ * ms->member3(12); free(ms); char *buffer = malloc(12); strcpy(buffer, argv[1]); ms->member3(12); exit(0); } ``` .codelink[
`23-memory-safety/stack-smashing.c`
] ] - `member2` is a **function pointer** - Can be dynamically set and called at runtime --- template: uaf
??? - Let's see how we can exploit use this program to force the execution of security_critical_function, that is not supposed to be called in this particular program - So let's look at what happens in memory - We have the code segment containing the executable instruction in other words the functions - And the heap, where our data structure object is allocated after the call to malloc - Its second member points to print_hello after its initialisation --- template: uaf
??? - Next the object is freed - Note that its content is not destroyed by free - That explains why a lot of use after free bugs just silently succeed --- template: uaf
??? - Then the buffer is allocated with malloc - As the user we have control over what is copied inside - Malloc will reuse space on the heap to save memory - So if we manage to overwrite the space corresponding to the function pointer in the old object that corresponded to the data structure - We can have it point to the security_critical_function --- template: uaf
??? - And when the use-after-free is made, it's not print_hello that is called, but rather the security_critical_function --- # Summary - C is **not memory safe** - Memory issues benign at a first glance can have **huge security consequences** - How to avoid these? ---- .center[Feedback form: https://bit.ly/3iybv0Y]
??? - C is not memory safe - Memory safety issues: not only crashes, but **security vulnerabilities** - Examples: read/write overflows, use-after-free, stack-smashing for infoleaks, sensitive data tampering, control flow hijacking - And that's it - C and C++ are not memory safe - Memory issue do not only cause crashes and weird behaviour, but also have huge security consequences - We saw a few examples of apparently mundane programming mistakes that can lead to very serious security consequences - In the next video, we'll see a few guidelines to avoid these mistakes as much has possible