class: center, middle ### Secure Computer Architecture and Systems *** # Memory Safety --- # Memory Safety - Memory safety: **protection against certain types of bugs relating to memory accesses** in a program - Protections can be enforced at compile-time and/or at runtime ??? - Memory safety is about protecting a program from a whole class of bugs that arise when the program accesses memory it shouldn't, for example when indexing an array out of bound or overflowing a buffer. - Some of the protections against these issues are applied at compile time, and others at runtime. -- - **Some languages like C and C++ do not have such protections** - Problem: these are the languages used for systems software development ??? - A major problem is that the programming languages used to write most systems software, C and C++, lack most of these protections: we say that C and C++ are not memory safe -- - When they occur, memory safety bugs lead to crashes/strange program behaviour but also to **security vulnerabilities** - They can be exploited by attackers to compromise: - Availability, e.g. crash a program/system - Confidentiality, e.g. leak sensitive data/secrets - Integrity, e.g. escalate privilege ??? - When memory safety violations occur, not only the program can crash or exhibit strange behaviour - But more concerning, these violations represent security issues that can be exploited by attackers to compromise availability, confidentiality, and integrity --- # Memory Safe Languages - Language enforcing memory safety is said to be **memory safe** - Examples: Java, Python, Haskell, etc. ??? - Contrary to C and C++, high level languages such as Java or Python are said to be memory safe -- - They enforce a **series of rules** to avoid memory safety bugs, e.g.: - No out of bounds access - Memory that has been deallocated cannot be accessed anymore - No reference to memory that has been deallocated can exist - Memory can only be freed once - NULL/invalid references cannot be dereferenced - Memory should be accessed through variables of the correct type - Memory should be initialised before being read ??? - They enforce a series of rules to prevent memory safety violations - These rules include checking for out of bound accesses, preventing deallocated memory from being accessed or even referenced - Making sure that memory can be deallocated only once, and that invalid references such as NULL pointers cannot be dereferenced - Making sure that memory is always initialised before being read, is always accessed through variables of the proper type -- ```python numbers = [1, 2, 3] # Python will throw an out of bounds exception here: print("The fourth number is:", numbers[3]) ``` .codelink[
`07-memory-safety/out-of-bounds.py`
] ??? - Here is an example of buggy code in Python, where we access an array out of bounds - If you try that code you'll get an exception, because python performs a bound check at runtime when the array is indexed --- # Memory Unsafe Languages - Languages that do not enforce memory safety are **memory unsafe** - Examples: C and C++ - **Problem: these are the systems software programming languages** ??? - As we mentioned contrary to higher-level languages, the most popular languages in systems software, that are C and C++, are not memory safe -- - No out of bounds checks - Memory deallocated can be pointed to and accessed - Memory can be freed multiple times - NULL/invalid references can be dereferenced - Uninitialised memory can be read - No type safety enforcement ??? - They unfortunately lack the majority of the memory safety violation checks mentioned in the previous slide - They do not check for: out of bound array and buffer accesses - They do not check for the presence of pointers to deallocated memory or their dereferencement - They do not check for things like double frees on the same buffer, the dereferencement of invalid or NULL pointers, reading uninitialised memory - Finally their type checking system can easily be bypassed for example with casts -- ```c int numbers[3] = {1, 2, 3}; // In C, no compile-/runtime notification: program misbehaves and prints garbage: printf("The fourth number is: %d\n", numbers[3]); ``` .codelink[
`07-memory-safety/out-of-bounds.c`
] ??? - Here is an equivalent C implementation of the python program we saw on the previous slide - This program is similarly incorrect, indexing the "numbers" array out of bound, however it compiles without any warning or error - It does not even crash at runtime, but rather prints garbage --- # C/C++: Unsafety by Design - Lack of safety checks in C/C++ is not an error in the languages' specs ??? - The lack of memory safety in C and C++ is a feature, not a bug -- - **Unsafety required to achieve the benefits these languages are known for** - No runtime checks on bounds, dereferenced/freed references validity, type correctness, etc. to **maximise performance** - No garbage collection/runtime metadata memory overhead to have a **low/controllable memory footprint** - No latency due to GC when **predictable latency is needed** e.g. in real-time system - Need to be able to **access arbitrary areas of the address space** e.g. when writing OS code ??? - The designers willingly did not implement all the aforementioned checks because they conflicted with the objective of the languages - Because C needs to be fast, we cannot afford runtime check on bounds, pointer validity, or type correctness - Because C needs to have a controllable memory footprint, we similarly cannot afford to hold a lot of metadata about bounds and reference validity - Because C requires a predictable execution time in scenarios such as real-time system, we cannot afford the nondeterministic latencies brought by techniques such as garbage collection - Finally, because C is used to write low level software such as operating system code, it needs to be able to access arbitrary areas of memory, for tasks such as device communications --- # Trading Off Safety for Speed .leftcol[ ```c #define N 100000000 // 100 million int main() { int *arr = malloc(N * sizeof(int)); for (int i = 0; i < N; ++i) arr[i] = i; clock_t start = clock(); long long sum = 0; for (int i = 0; i < N; ++i) { // bound check here would be: if (i
`07-memory-safety/speed.c`
] ] ??? - To illustrate how the lack of memory safety makes C significantly faster than higher level languages let's look at this example program - It initialises an array with the numbers from 0 up to 1 hundred millions minus one - It then sums up all of these numbers and prints the result, as well as the time it took to perform the sum - The time is computed by using `clock` to get a timestamp before and after the sum, and subtracting these -- .rightcol[ ```python # Equivalent Python code: import time N = 100_000_000 # 100 million # Create the list arr = list(range(N)) # [0, 1, 2, ..., N-1] # Start timer start = time.time() # Sum with bounds-checked access sum = 0 for i in range(N): sum += arr[i] # Bounds check every time # End timer end = time.time() print("Sum =", sum) print("Time = {:.3f} " "seconds".format(end - start)) # ``` .codelink[
`07-memory-safety/speed.py`
] ] ??? - This is the equivalent program in python - It performs exactly the same things - You can pause the videos and study these program if you want to understand their implementation in details - As you can see here when we run these two programs, the C version is about 70 times faster than then python one - This is because all the memory safety checks of python slow things down a lot --- # Common Memory Safety Issues **Buffer/array overflows:** ```c int array[4] = {0, 1, 2, 3}; for(int i=0; i<=4; i++) array[i] *= 2; // when i == 4, overflows array ``` .codelink[
`07-memory-safety/buffer-overflow.c`
] ??? - Let's have a look now at a few common memory safety errors - We have already seen the buffer overflow - Here we have an array out of bound indexing which is a subclass of it - When i equals 4 the array array will be indexed out of bounds: the memory past the array will be read, multiplied by two, and then written, which obviously is a bug -- **Underflows** can also happen, e.g. reading/writing `array[-1]`. ??? - Obviously the issue can also happen the other way around, addressing an array or a buffer out of bound before its location in memory, that's an underflow -- **Use-after-free/dangling pointers:** ```c int *buffer = malloc(1 * sizeof(int)); // do something with buffer here ... // after that free, buffer points to unallocated memory: // it's a dangling pointer (invalid reference) free(buffer); *buffer = 42; // use after free ``` .codelink[
`07-memory-safety/uaf.c`
] ??? - Another very common error is the use after free - In such scenario a buffer previously allocated with malloc is freed with free, and at some point later that buffer is dereferenced - Starting from the free statement, the pointer buffer is invalid and references unallocated memory - Dereferencing it is obviously a bug --- # Common Memory Safety Issues (2) **Double free:** ```c int *buffer = malloc(1 * sizeof(int)); // do something with buffer here ... free(buffer); // more code here ... free(buffer); // double free ``` .codelink[
`07-memory-safety/double-free.c`
] ??? - Freeing the same pointer twice or more is another programmer mistake that may happen - This will generally trigger some misbehaving by the memory allocator -- **`NULL` pointer dereference:** ```c int *ptr = NULL; // forgot to call malloc ... *ptr = 42; // dereference NULL (address 0) free(ptr); // another problem: tries to free a NULL pointer ``` .codelink[
`07-memory-safety/null-dereference.c`
] ??? - Another bug that you may already know is the dereferencing of a NULL pointer - NULL is encoded as 0 in C so dereferencing a NULL pointer will be in effect accessing the memory at address 0 - Most operating system do not map the first page of the address space, so in most cases we'll get a crash, but if something is mapped there the program will start to misbehave --- # Common Memory Safety Issues (3) **Accessing uninitialised memory:** ```c int x; int *ptr = malloc(sizeof(int)); // what ends up in y and z? we do not know! int y = x; int z = *ptr; ``` .codelink[
`07-memory-safety/uninitialised-mem.c`
] ??? - Last but not least: accessing uninitialised memory - Here a buffer is allocated with malloc and then read before being written to - In C you cannot assume that memory allocated dynamically or local variables are initialised to 0, in fact most of the time they contain garbage and should never be read before being written to -- - In most cases **the compiler will not warn about these issues** - At runtime, **program misbehaves, sometimes silently** - Can be hard to detect - Can also be hard to debug, needs to trace symptom (e.g. program crash) back to the root cause ??? - So a very important thing to note here is that most of this programming mistakes that lead to memory safety violations will not be detected by the compiler, you won't get any warning or error - At runtime, they will lead to the program misbehaving, sometimes silently: the program will look to be running fine, although there is actually a bug under the hood - For these reasons, memory errors can be hard to detect and sometimes live silently within production code bases for years - They can also be quite difficult to reproduce, and to debug --- # How Do These Errors Sneak In? - Examples given so far are very small/simple code snippets ??? - Now how do these errors end up in codebases - Of course the examples I just gave you are overly simple and it's unlikely that someone would be silly enough to make these mistakes on such small pieces of code -- - **However they are much more likely to happen on real world programs:** - High complexity, made of thousands to millions of lines of code - Sometimes hard (but important) to reason about what code/data can be trusted or not - Developed by different programmers - Codebase evolving over years ??? - But what you need to understand is that these errors are much likely to happen on real world code bases - They are very large, made of up to tens of millions of lines of code - This complexity makes quite hard to reason about the code and its safety to determine things like can I free this object here, how to determine the proper size to give to a buffer, the proper amount of iterations for a loop, and so on. - These codebases evolve significantly over time, and you have many programmers that contribute to them - All of this brings even more complexity and increases the chances of programming mistakes to sneak in, leading to memory safety violations --- # Spatial & Temporal Memory Safety - **Spatial memory safety**: enforcing accesses within the bounds of addressed objects and allocated memory - Examples of violations: buffer over/underflows, indexing arrays out of bound
- Exploited by an attacker these violations lead to sensitive data tampering and leaks, malicious code execution, and denial of service: all aspects of CIA are broken ??? - There are two main classes of memory errors, the first is named spatial memory safety - Spatial errors happen when accessing memory outside the bounds of a target object - Examples of such errors are indexing an array out of bound and over or underflowing a buffer - They are illustrated on the slide, with green arrows representing valid memory accesses, and red arrows representing spatial memory safety violations - An attacker can exploit these errors and tamper with or leak sensitive data and code, execute malicious code, and disturb or crash the program -- in other words, the attacker can break all aspects of the confidentiality/integrity/availability triad --- # Temporal Memory Safety - **Temporal memory safety:** preventing access to memory that is no longer valid - Examples of violations: use-after-free, dangling pointers, reading uninitialised memory
- Exploited, also allow breaking all aspects of CIA ??? - The second main class of memory errors relates to temporal memory safety - Temporal errors happen when memory that is no longer valid because it has been deallocated, or not yet valid because it has not been initialised yet - The diagram on the slide represents a use after free, with a buffer allocated and initialised, then accessed properly until it is freed. - Any access past the moment free was called is a temporal safety violation - Similarly to spatial errors, temporal ones allow an attacker to break all aspects of the confidentiality, integrity, and availability triad. --- ## Beyond Memory Safety: Undefined Behaviour - Spatial and temporal violation represent a subset of erroneous actions defined as **undefined behaviour**. ??? - Memory errors lead the program into what is called undefined behaviour -- - From the [C FAQ](https://c-faq.com/ansi/undef.html): > Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended. ??? - Undefined behaviour is aptly named: the C FAQ defines is as "Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do exactly what the programmer intended." - That echoes well the "silent manifestation" aspect of memory errors I mentioned previously - Once again the important thing to remember is that even if the program seems to run fine, if there is a memory error there is a problem, and it needs to be fixed -- - Reason to be of UB is once again to let the compiler generates very efficient code ??? - As mentioned previously the reason why the compiler/runtime does not trigger errors when these errors are present is because we want to have fast and efficient C programs --- ## Examples of Undefined Behaviour - All the memory errors previously described - Signed integers under/overflows - Oversized shifts - Passing a function as parameter to `sizeof` - Casting an `int *` into a `float *` and dereferencing - More: https://port70.net/~nsz/c/c99/n1256.html#J.2 ??? - Beyond memory errors, there are other programming mistakes that lead to undefined behaviour - Such as integer over and undeflows, oversized shifts, passing a function as parameter to `sizeof`, casting and `int *` into a `float *` and dereferencing that pointer, amongst others - All of these can lead to security problems -- ```c #include
// integer overflow: INT_MAX is the largest number that can be stored in an int printf ("%d\n", (INT_MAX+1) < 0); ``` .codelink[
`07-memory-safety/integer-overflow.c`
] ??? - Here is an example of integer overflow - We add one to the maximum integer that can be stored as an int - And check if the result is inferior to zero - That should be probably be false but because we are overflowing the integer, the program misbehaves and prints out that it is true -- - If a program goes into undefined behaviour, **the entirety of its execution is invalid** - It is buggy and needs to be fixed - Even if it seems to run fine ??? - So once again, if the program goes into undefined behaviour, the entirety of its execution is invalid - The program is buggy, and it needs to be fixed, even if it seems like it runs fine --- # Summary - **C and C++ are not memory safe**, for various reasons e.g. performance - Lack compile-time and runtime enforcement of proper memory accesses - **Spatial safety**: accessing memory at the right locations - **Temporal safety**: accessing memory at the right time - Memory safety violations are a subclass of errors leading the program into **undefined behaviour** - May have various or no visible consequences - One thing is certain: a program that can go into undefined behaviour is buggy and needs to be fixed ??? - To summarise, for various reasons such as performance, C and C++ are not memory safe - There is not compile or runtime checks for a large range of programming mistakes - They neither enforce spatial nor temporal safety, which respectively correspond to accessing memory at the right place and at the right time - Memory safety errors are a subclass of issues leading the program into undefined behaviour - When that happens, the program may crash, misbehave, or seem to be running fine - Either way, it is buggy, and the errors it contains may lead to security vulnerabilities: they need to be fixed