class: center, middle ### Secure Computer Architecture and Systems *** # Exploiting Vulnerabilities Part 2:
Trust Boundaries in Programs ??? - Hi everyone - In this video we are going to talk about the concept of trust boundaries in programs --- # Distrusting Command Line Parameters - For all attacks we have seen so far, memory errors are exploited through a payload coming from the **command line** - **Payload**: piece of program input, can be maliciously malformed to trigger a vulnerability ??? - In most of the examples of attacks we covered previously, the payload comes from the command line - As a reminder the payload is the malformed piece of program input that the attacker uses to trigger a vulnerability and perform an exploit - Not trusting the validity of the number and the values of command line parameters is a well known security practice - In the past you have probably already been checking the validity of these things to a certain extent -- - What was our trust model? ??? - In fact, now that we are talking about distrusting some forms of inputs to the program, we can try to reason about what was our trust model for all these attack examples we saw --
??? - From the victim's program protection point of view, we trust the privileged layers like the OS, as well as the hardware, to work correctly - What we did not trust wast other programs that could inject command line arguments into our victim program - If the program is invoked by the attacker on the command line, as we have seen in our examples, this untrusted other program could be the invoking shell --- # Distrusting Command Line Parameters - From the program's point of view, **should not assume that anything coming from the command line is well-formed** - An attacker could try to invoke the program with: - Wrong number of parameters - Bad combination of parameters - Wrong parameter values: bad type, sizes, ranges, etc. ??? - You are probably at least partially aware that a program you develop should never assume that any data flowing into the program through command line arguments is well formed - To trigger vulnerabilities, an attacker can try to invoke the program with the wrong number of parameters, bad combination of parameters, and wrong types, sizes, ranges for certain parameters -- - How does the implementation reacts when this happens? - Does the program exit gracefully, or does it go into undefined behaviour (security issues)? ??? - So as the developer of an application you need to reason about how your program reacts when something malformed is passed through the command line - Does your program crashes or misbehaves? If so that is not good and you probably have security issues. - The intended way to deal with these errors is to handle them gracefully, for example by printing an error message and exiting the program -- - This is not just about the invoker making mistakes when crafting the command line - According to the threat model, the invoker may be malicious, and **actively input bad parameters to trigger bugs** ??? - Please also note that this is not just about the user invoking the program setting by mistake the wrong number of arguments or the wrong value for an argument - You need to think about your trust model and assume that non trusted actors will actively try absolutely anything possible to trigger bugs in your program and subvert it --- # Trust Boundaries in Programs - In our scenarios the command line is a **trust boundary** - An interface between an untrusted component and a trusted one, according to our threat model - That makes it a **vector of attack** ??? - The command line in our scenario is an interface between a trusted and an untrusted component - That makes it a vector of attack -- > **The validity of all data flowing through this interface needs to be checked before that data is used** ??? - and protection is required to ensure that the data flowing through this interface is valid before it can be used by the trusted component -- - Sanity checks: - Do we have the right amount of parameters? - Do parameters make sense together (proper combinations)? - Do parameters have proper values in terms of types, ranges, format, etc.? ??? - This involves applying sanity checks on that untrusted data - Checks like do we have the right amount of command line parameters, do the combination of parameters passed on the command line makes sense, and is the value of each parameters valid based on its type, size, range, format, etc. --- # Trust Boundaries in Programs (2) - **Beyond the command line, more traditional trust boundaries:** ??? - And it's not just about the command line, there are several other common sources of untrusted input in modern systems software --
??? - The standard input can be used by an attacker to feed bad data to your program - Environment variables can be manipulated - All the data flowing into the program through disk or network I/O could be invalid, think about malformed file formats, corrupted packet metadata, there is an infinity of possibilities how things can be malformed here - Finally, if your program is communicating with another process you don't trust through inter process communication, that is also a vector of attack -- - Considering them depends on your threat model, but almost every production-ready program using these interfaces will need to sanitise ??? - Overall considering each of these attack vectors depends on your threat and trust models, but if you are building a production ready application which is using some of these interfaces, then you need to reason about the possibility of attack through them and implement protections --- # Example: Command Line Arguments .leftcol[ ```c #include
#include
// usage: ./cmdline
int main(int argc, char **argv) { char username[32]; char password[32]; strcpy(username, argv[1]); strcpy(password, argv[2]); // ... } ``` .codelink[
`09-exploiting-vulnerabilities-2/cmdline.c`
] ] ??? - We've already seen plenty of example of programs that can be subverted through wrongfully formed command line arguments - Here we have a vulnerable program with two buffers that can be overflown -- .rightcol[ ```c #define USERNAME_MAX_LEN 32 #define PASSWORD_MAX_LEN 32 int main(int argc, char **argv) { char username[USERNAME_MAX_LEN]; char password[PASSWORD_MAX_LEN]; // check the number of parameters if(argc != 3) { printf("usage: %s
\n", argv[0]); return 0; } // don't copy past the buffer size strncpy(username, argv[1], USERNAME_MAX_LEN); strncpy(password, argv[2], PASSWORD_MAX_LEN); // make sure strings are properly terminated username[sizeof(username) - 1] = '\0'; password[sizeof(password) - 1] = '\0'; // ... } ``` .codelink[
`09-exploiting-vulnerabilities-2/cmdline-fixed.c`
] ] ??? - A protected version of that program is on the right of the slide - As we can see we first validate that we have the right number of command line arguments - Then with strncpy we make sure not to copy more bytes than the size of the receiving buffers - And finally we make sure that the strings are properly terminated, because the attacker could pass them in such a way that they are not --- # Example: Environment Variables ```c #include
#include
// usage: USER_INPUT=pierre ./environment-variable int main(int argc, char *argv[]) { char *user = getenv("USER_INPUT"); if (!user) { fprintf(stderr, "Please set the USER_INPUT environment variable.\n"); return 1; } char buffer[100]; // Vulnerable: format string comes from environment variable snprintf(buffer, 100, user); printf("Hello, "); puts(buffer); return 0; } ``` .codelink[
`09-exploiting-vulnerabilities-2/environment-variables.c`
] ??? - Here is another example of bad data injection, this time through an environment variable - We first get a pointer to the value of this environment variable named USER_INPUT with the `getenv` libc function - Then we use snprintf to copy the value of the environment variable into buffer - There is no possibility of overflow here because we know that `snprintf` won't write more than 100 bytes which is the size of the receiving buffer - However snprintf takes as third parameter a format string, and possibility as fourth and more parameters a list of variables which value should be substituted to tokens in the format string, exactly like printf - So if we pass through the environment variable something like looks like a format string with tokens - We can leak part of the program's memory on the command line when the format string is printed - Some of these look like pointers, and leaking pointer is an important step in many attacks as we will see in one of the next videos --- # Example: Environment Variables (2) ```c #include
#include
// usage: USER_INPUT=pierre ./environment-variable int main(int argc, char *argv[]) { char *user = getenv("USER_INPUT"); if (!user) { fprintf(stderr, "Please set the USER_INPUT environment variable.\n"); return 1; } char buffer[100]; * snprintf(buffer, sizeof(buffer), "%s", user); printf("Hello, "); puts(buffer); return 0; } ``` .codelink[
`09-exploiting-vulnerabilities-2/environment-variables-fixed.c`
] ??? - The fix here is simple, have the format string be simply `%s` and that token be replaced with snprintf by a single variable which is the value of the environment variable - Even simpler, for copying a string just use strncpy --- # Example: HeartBleed - HeartBleed (CVE-2014-0160): critical vulnerability in OpenSSL that allowed remote attackers to read memory from vulnerable servers ??? - Let's have a look at one last example, this time taken from the real world - You may have heard about the heartbleed vulnerability in the OpenSSL library that is used to encrypt most of the https traffic of the internet - It's a very severe issue that caused a big commotion in 2014 -- - Malformed heartbeat request: - Malicious client controls the size of the server's response - Sets it larger than the response's data - Buffer overflow in read mode, server memory content (e.g. crypto keys) sent back to the client
??? - With heartbleed the attacker's payload comes through the network - The attacker here controls a remote client and aims to leak sensitive data from the server - The client regularly send a heartbeat request to the server to keep the connection alive - The client indicates within the request the size of the response the server should send back, and sets that number to a larger value than the actual response the server will write - This triggers and overflow in read mode on the heap of the server, and the memory read this way is sent back to the client - It could contain anything including crypto keys that are commonly manipulated by that library -- --- # Example: HeartBleed
.center[https://xkcd.com/1354/] ??? - There is nothing better than this xkcd comic to explain the HeartBleed bug, so credits to the author of XKCD - Under normal operation the client asks the server to respond "POTATO" and also gives the server the size it should us to respond potato which is 6 letters --- # Example: HeartBleed
.center[https://xkcd.com/1354/] ??? - The server answers potato in 6 letters and it's all good --- # Example: HeartBleed
.center[https://xkcd.com/1354/] ??? - Rinse and repeat with BIRD in 4 letters this time --- # Example: HeartBleed
.center[https://xkcd.com/1354/] ??? - All good --- # Example: HeartBleed
.center[https://xkcd.com/1354/] ??? - Now the exploit consist in the client asking the server for a relatively small reply but with a very large reply size, here the reply should be bird but at the same time the client wants 500 letters --- # Example: HeartBleed
.center[https://xkcd.com/1354/] ??? - This leads to a read overflow in the server's memory, of a bit less than 500 bytes past the buffer holding bird - This memory is sent back to the client, and it may contain very sensitive data due to the security-critical nature of the OpenSSL library --- # Example: HeartBleed ```c int main() { char secret[64] = "SECRET: This is private data that shouldn't leak!\n"; int server = socket(AF_INET, SOCK_STREAM, 0); int opt = 1; setsockopt(server, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt)); struct sockaddr_in addr = { .sin_family = AF_INET, .sin_port = htons(12345), .sin_addr.s_addr = INADDR_ANY }; bind(server, (struct sockaddr*)&addr, sizeof(addr)); listen(server, 1); int client = accept(server, NULL, NULL); unsigned char buf[32] = {0}; recv(client, buf, sizeof(buf), 0); // Heartbleed-style vulnerability: // client sends: [type][len][data] -> respond with `len` bytes int len = buf[1]; // vulnerable: no bounds check send(client, buf + 2, len, 0); close(client); close(server); } ``` .codelink[
`09-exploiting-vulnerabilities-2/heartbleed.c`
] ??? - Here is a simplified implementation of the heartbleed bug in the code of the server - I will focus on the bug here but feel free to pause the video and check out the code more in details if you want to understand how these network-related system calls work - You can see that the server receives data from the client in a 32 bytes buffer, according to a particular format: - the first byte indicates the request type, used to indicate a heartbeat request - the second byte indicates the size the server response should have - and the next 30 bytes contains the data that should be present in the server's response --- # Example: HeartBleed The fix: ```c int client = accept(server, NULL, NULL); unsigned char buf[32] = {0}; *// zero out buf *memset(buf, 0x0, 32); *recv(client, buf, sizeof(buf), 0); int len = buf[1]; *// sanity check len *if(len > (32-2)) * len = (32-2); send(client, buf + 2, len, 0); ``` .codelink[
`09-exploiting-vulnerabilities-2/heartbleed-fixed.c`
] ??? - The fix is simple: to put a cap on the size that can be indicated by the client, here we make sure it cannot be longer than the 30 bytes we have to hold it --- # Handling Trust Boundaries - Programmer needs to reason about trust boundaries in the program - **Need a trust model**, example for a web server: ??? - So as we saw it's very important that as a developer you secure the trust boundaries in your program - For that you need to reason about your trust model - Here is an example of trust model for a server -- .small[ | **Source** | **Example Use** | **Trust Level** | **Reasoning / Risk** | |--------------------------|--------------------------------------------------------|---------------------------|--------------------------------------------------------------------------------------| | **Command-line args** | `./server --config=config.txt` | ❌ *Untrusted* | User-controlled; could point to malicious files or overflow buffer sizes | | **Environment variables**| `export PORT=8080` | ❌ *Untrusted* | Inherited from shell; attacker can manipulate via scripts or misconfigurations | | **Standard Input** | Admin enters `reload` via terminal | ❌ *Untrusted* | Human error or input injection if stdin is redirected | | **Configuration file** | Parses `config.txt` for allowed IPs or auth keys | ⚠️ *Partially trusted* | Could be modified by external actors; needs file integrity checks and format validation | | **Network input** | Receives `GET /index.html` requests via TCP socket | ❌ *Totally untrusted* | Malicious clients can send malformed, oversized, or malicious payloads | | **Internal constants** | Default port `= 80`, buffer sizes | ✅ *Trusted* | Controlled by developer; no user influence | ] ??? - We do not trust the command line arguments or environment variables - If there is somehow an interactive command line, we do not trust whatever comes through the standard input either - We do not trust network input either, requests could be malformed as we just saw - The server's configuration files on the filesystem are partially trusted: it may be possible for an attacker to alter them if the filesystem permissions are not set up correctly, so a bit of sanity checks on the configuration coming from these files is probably a good idea - Finally, internal constants in the program's binary are assumed to be trusted --- # Handling Trust Boundaries (2) - Trust boundaries represent interfaces between untrusted and trusted components and their usage needs to be **sanitised**: - Validate before use types, sizes, ranges, and consistency of **data** flowing through the interface - **Avoid leaking data/references** to untrusted components - Validate the **control flow**: enforce proper ordering in the use of trusted interfaces' primitives
??? - Based on your trust model, it is your responsibility as the developer to identify interfaces between untrusted and trusted components, and to sanity check all the data and control flow going through these interfaces - That means validating before use data types, sizes, ranges, but also the consistency of pieces of data together - It also allows avoiding leaking data and references to untrusted components by zeroing out data that is not initialised - It's not only about data, the control flow should be validated too: an example is enforcing ordering: when you have something like a network protocol, and you have requests of type A that should always be sent before requests of type B, what are the implications of inverting that order? --- # Handling Trust Boundaries (3) - Properly securing trust boundaries becomes **very hard** when the target program and the considered trust boundaries are complex - For this reason, particular types of software is known to suffer from bugs: - **Parsers**: handles feature-rich and complex (e.g. XML) formats - **Web browsers**: handles large amount of untrusted inputs (e.g. HTML, CSS, JS, etc.) - **Image/document processors**: complex file formats, sometimes embedding code - **Shell/command line parsers**: may support many features - **Network protocol stacks**: can be complex and support many features ??? - Securing this kind of interfaces becomes very hard when the program and its trust boundaries are large and complex - This is why we have entire classes of software that are really prone to suffer from vulnerability because it's impossible to guarantee that their trust boundaries have been 100% sanitised - Example include parsers, web browsers, images and document processors, shell and command line parsers, and network stacks - All of these are complex pieces of software handling complex data formats, often exposing interfaces that are themselves proportionally complex --- # Summary - Many sources of untrusted input in a program - Can be exploited to subvert the program and break CIA - Programmer needs to reason about trust: 1. Define a threat model, identify trust boundaries 2. Secure these interfaces - Properly sanitising trust boundaries is challenging with complex programs and interfaces ??? - To conclude, there are many sources of untrusted input in a program - Without proper protection they can be exploited to subvert it and break confidentiality, integrity and availability - As the programmer, it is crucial that you reason about trust in the software you build: - That means defining a trust model, identifying trust boundaries, and securing these interfaces - That last step can be quite challenging in large programs with complex interfaces