class: center, middle ### COMP26020 Programming Languages and Paradigms Part 1: C Programming *** # C and Memory Safety: Good Practices ??? - Hello everyone - In the previous lecture we saw how memory errors can lead to serious security issues in memory unsafe languages such as C and C++ - In this short video, we'll see how avoid making such mistakes as much as possible --- # The Problem - C and C++ have **many benefits** and are still the default languages in several application domains - C/C++ are also inherently **memory unsafe** .center[How can we try to avoid as much as possible these dangerous memory errors?] 1. Some tools can help (won't fix everything!) 2. **Write good code** ??? - We saw all the benefits of C and C++, as well as multiple use cases demonstrating that these languages are still extensively used - Another proof of their importance is the fact that you are learning them right now - But we also saw that programs written in C or C++ are prone to memory issues which is bad for security - How can we avoid the programming mistakes that lead to these security issues? - In the next slide I will list a few tools that can help - And we will also see a few guidelines about how to write good code --- # Tools - **Enable extra warnings** with compiler flags: - `-Wall` - `-Wextra` - `-pedantic` More info: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html ??? - A first basic advice is to enable the many compiler warnings that are disabled by default - This can help detect bugs - You can do that by using the following flags - `-Wall` will enable a first set of warnings - and `-Wextra` will activate additional warnings on top of that - you also have `-pedantic`, that will enable warnings forcing you to write C code that conforms strictly to the ISO C standard - All these options activate warnings that are particularly strict and you'll need to reason a bit about the things they point to in your code - Are these things bugs or not --- # Tools - Dynamic Analysis: - [Valgrind](https://valgrind.org/) - [Address Sanitizer](https://github.com/google/sanitizers) (ASAN): - Add these compiler flags: `-fsanitize=address -fno-omit-frame-pointer` - BoundCheck, Purify, etc. - Static Analysis: - [Clang Static Analyser](https://clang-analyzer.llvm.org/) - Lint, Coverity, cppcheck, etc. ??? - Tools can be classified as dynamic and static - Dynamic tools work by instrumenting your program with additional checks and observing its behaviour at runtime - We already spoke about Valgrind, that focuses on the heap and is very practical to check for memory leaks, uninitialised memory reads, etc. - There is another popular tool named address sanitiser or ASAN, that can detect overflows, use after frees, leaks, etc. - To use ASAN simply add the following compiler flags and run the program normally - You'll get a crash when ASAN detects a bad memory access, with a log of exactly what went wrong - Other dynamic tools include BoundCheck, Purify, and so on - And then we have the static tools - They analyse the code of your program without running it - A famous tool is Clang static analyser - It can check for things like division by zero, null pointer dereferences, usage of uninitialised values, etc. - You also have some other static tools listed on the slides - An important thing to note is that static and dynamic tools have their pros and cons, and there is no silver bullet - No tool will detect all the memory errors in all programs - Moreover, even if you run all the possible tools on a given program, it is not guaranteed that all the errors have been detected --- # Writing Good Code - The compiler won't catch all mistakes - The tools have their limitations - **Writing good code** from the start is super important - It will save you a lot of debugging time - It will save you from introducing serious security issues ??? - So even if some tools can help, it's very important to write good code - The compiler does not catch all mistakes and the tools are not perfect - Writing good code will save you a lot of debugging time - And of course will reduce the chances of introducing security vulnerabilities --- # Undefined Behaviour .center[Worst bugs in C/C++: **undefined behaviour**] - "Renders the entire program meaningless if certain rules of the language are violated." (from cppreference.com) - program can crash (good!) - program can behave weirdly (pretty good!) - program can *seem to* behave normally (argh!) - Common sources include: - Reading an uninitialised variable - Reading/writing out of the bounds of an array - Dereferencing a `NULL` (`0x0`) pointer - Overflow in signed integer arithmetic - Dereferencing a freed pointer - Freeing a pointer twice - etc. ??? - So in C there is this notion of undefined behaviour - When the programs enter undefined behaviour, you basically cannot assume anything anymore - The program can crash, which in some sense is good as it forces you to investigate and fix the bug - The program can behave in a strange way, which is harder but still possible to detect so it's not that bad - But it can also seem to execute fine, for various reasons, for example maybe the particular condition of a memory error are not encountered - This kind of difficult to detect bugs are the worst, because you could have a security vulnerability without knowing it - And memory errors lead your program into undefined behaviour - You have some examples on the slide - Reading an uninitialised variable - Reading/writing out of the bounds of an array - Dereferencing a `NULL` (`0x0`) pointer - Overflow in signed integer arithmetic - Dereferencing a freed pointer - Freeing a pointer twice - etc. --- # Array/Buffers Sizes, Integer Overflows - **Keep track of your arrays' sizes** - **Be aware of type sizes on the architecture you target to avoid overflows** - `sizeof()` - Signed overflow: undefined behaviour ??? - So regarding standard arrays and buffers, in C/C++ they don't embed their sizes and it is your responsibility as the programmer to keep track of the sizes - This is true for any array, including strings, as well as all types of buffers - If a function takes an array or a buffer as parameter, its size should probably also be passed - You should also be aware of the sizes of different types on the architecture you target to avoid overflows - sizeof() gives you that information - Unsigned overflow wraps over to 0 but signed overflow is undefined - In case of doubt, use wider type for arithmetics, check for overflow and if the result is indeed within bounds convert back to smaller type --- # The C standard Library - Check man pages - **Never use these functions**, always unsafe: - gets (use fgets) - getwd (use getcwd) - readdir_r (use readdir) - More here: [https://bit.ly/3ef4TBc](https://bit.ly/3ef4TBc) ??? - Concerning the C standard library - It's good to check out the man pages for its functions - Some allocate results with malloc and it may be your responsibility to free the corresponding space - Also, never use these functions, they are almost always unsafe --- # String Manipulation Functions - Use the *n* methods, - `strcpy` -> `strncpy` - `sprintf` -> `snprintf` - etc. - Even with the versions with *n*, some particularities - `strncpy` won't add `\0` at the end of the target buffer ```c char string1[] = "hello, world"; char string2[32] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"; strncpy(string2, string1, strlen(string1)); printf("%s\n", string2); // prints "hello, worldxxxxxxxxxxxxxxxxxxx" ``` .codelink[
`24-good-practices/strncpy-bug.c`
] ??? - Regarding string manipulation functions, - Generally you should use the versions including *n*, that not only check for \0 to find the end of a string but also allow a programmer-defined character limit - Use strncpy rather than strcpy, snprintf rather than sprintf, and so on - Even with the functions with *n* methods, be careful about some particularities - For example strncpy does not ensure that the target buffer is terminated by `\0` --- # Dynamic Memory Allocation - Check `malloc` return value - After free, the pointer is invalid - Cannot be dereferenced - Cannot be **used** (ex: comparison) - `realloc` returns null upon failure but does not free the old pointer - so this: `ptr = realloc(ptr, new_size)` is a leak ??? - Finally, regarding dynamic memory allocation, a few things to keep in mind - Don't forget to check malloc return value to avoid NULL pointer usage - Remember that after free, the pointer is invalid - Not only it cannot be dereferenced, that's relatively obvious - But even its value itself, the previously pointed address, cannot be used anymore for example for comparison with other addresses - And a last thing: when it fails, realloc returns null but does not free the old pointer - so this is wrong: `ptr = realloc(ptr, new_size)` --- # Summary & How to Learn More - Compile-time errors, easily reproducible runtime crashes and easily detectable program behaviour divergence - **Nice bugs**, you can detect them - Undefined behaviour: - **Nasty bugs**, you can fail to notice them and it can lead to serious vulnerabilities - Solution: write good code + some tools can help - How to learn more: - https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/index.html - Secure Coding in C and C++, 2nd edition by Robert C. Seacord ??? - So let's recap - In C and C++ we have nice bugs, such as compile-time errors, easily reproducible runtime crashes and easily detectable program behaviour divergence - These bugs are nice because we can detect them - And then we have the nasty bugs, i.e. undefined behaviour: - Many classes of memory errors - You can fail to notice them and it can lead to serious vulnerabilities - Solution: write good code + some tools can help - How to learn more: - https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/index.html - Secure Coding in C and C++, 2nd edition by Robert C. Seacord --- # Feedback form .center[https://bit.ly/3CBOipk]