COMP26020p1 - C and Memory Safety: Good Practices

class: center, middle

### COMP26020 Programming Languages and Paradigms Part 1: C Programming
***
# C and Memory Safety: Good Practices

???

- Hello everyone
- In the previous lecture we saw how memory errors can lead to serious security
issues in memory unsafe languages such as C and C++
- In this short video, we'll see how avoid making such mistakes as much as
  possible

---
# The Problem

- C and C++ have **many benefits** and are still the default languages in
several application domains
- C/C++ are also inherently **memory unsafe**

.center[How can we try to avoid as much as possible these dangerous memory errors?]

1. Some tools can help (won't fix everything!)
2. **Write good code**

???

- We saw all the benefits of C and C++, as well as multiple use cases demonstrating that these languages are still extensively used
- Another proof of their importance is the fact that you are learning them right now
- But we also saw that programs written in C or C++ are prone to memory issues which is bad for security
- How can we avoid the programming mistakes that lead to these security issues?
- In the next slide I will list a few tools that can help
- And we will also see a few guidelines about how to write good code

---
# Tools

- **Enable extra warnings** with compiler flags:
  - `-Wall`
  - `-Wextra`
  - `-pedantic`

More info: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

???

- A first basic advice is to enable the many compiler warnings that are disabled
  by default
- This can help detect bugs
- You can do that by using the following flags
- `-Wall` will enable a first set of warnings
- and `-Wextra` will activate additional warnings on top of that
- you also have `-pedantic`, that will enable warnings forcing you to write
  C code that conforms strictly to the ISO C standard
- All these options activate warnings that are particularly strict and you'll
  need to reason a bit about the things they point to in your code
- Are these things bugs or not

---
# Tools

- Dynamic Analysis:
  - [Valgrind](https://valgrind.org/)
  - [Address Sanitizer](https://github.com/google/sanitizers) (ASAN):
      - Add these compiler flags: `-fsanitize=address -fno-omit-frame-pointer`
  - BoundCheck, Purify, etc.
- Static Analysis:
   - [Clang Static Analyser](https://clang-analyzer.llvm.org/)
   - Lint, Coverity, cppcheck, etc.

???

- Tools can be classified as dynamic and static
- Dynamic tools work by instrumenting your program with additional checks and
  observing its behaviour at runtime
- We already spoke about Valgrind, that focuses on the heap and is very
  practical to check for memory leaks, uninitialised memory reads, etc.
- There is another popular tool named address sanitiser or ASAN, that can
  detect overflows, use after frees, leaks, etc.
- To use ASAN simply add the following compiler flags and run the program
  normally
- You'll get a crash when ASAN detects a bad memory access, with a log of
  exactly what went wrong
- Other dynamic tools include BoundCheck, Purify, and so on
- And then we have the static tools
- They analyse the code of your program without running it
- A famous tool is Clang static analyser
- It can check for things like division by zero, null pointer dereferences,
  usage of uninitialised values, etc.
- You also have some other static tools listed on the slides
- An important thing to note is that static and dynamic tools have their pros
  and cons, and there is no silver bullet
- No tool will detect all the memory errors in all programs
- Moreover, even if you run all the possible tools on a given program, it is not
  guaranteed that all the errors have been detected

---
# Writing Good Code

- The compiler won't catch all mistakes
- The tools have their limitations
- **Writing good code** from the start is super important
  - It will save you a lot of debugging time
  - It will save you from introducing serious security issues

???
- So even if some tools can help, it's very important to write good code
- The compiler does not catch all mistakes and the tools are not perfect
- Writing good code will save you a lot of debugging time
- And of course will reduce the chances of introducing security vulnerabilities

---
# Undefined Behaviour

.center[Worst bugs in C/C++: **undefined behaviour**]

- "Renders the entire program meaningless if certain rules of the language are violated." (from cppreference.com)
  - program can crash (good!)
  - program can behave weirdly (pretty good!)
  - program can *seem to* behave normally (argh!)
- Common sources include:
  - Reading an uninitialised variable
  - Reading/writing out of the bounds of an array
  - Dereferencing a `NULL` (`0x0`) pointer
  - Overflow in signed integer arithmetic
  - Dereferencing a freed pointer
  - Freeing a pointer twice
  - etc.

???

- So in C there is this notion of undefined behaviour
- When the programs enter undefined behaviour, you basically cannot assume
  anything anymore
- The program can crash, which in some sense is good as it forces you to
  investigate and fix the bug
- The program can behave in a strange way, which is harder but still possible to
  detect so it's not that bad
- But it can also seem to execute fine, for various reasons, for example maybe the particular condition  of a memory error are not encountered
- This kind of difficult to detect bugs are the worst, because you could have a security vulnerability without knowing it
- And memory errors lead your program into undefined behaviour
- You have some examples on the slide
- Reading an uninitialised variable
- Reading/writing out of the bounds of an array
- Dereferencing a `NULL` (`0x0`) pointer
- Overflow in signed integer arithmetic
- Dereferencing a freed pointer
- Freeing a pointer twice
- etc.

---
# Array/Buffers Sizes, Integer Overflows

- **Keep track of your arrays' sizes**
- **Be aware of type sizes on the architecture you target to avoid overflows**
  - `sizeof()`
  - Signed overflow: undefined behaviour

???

- So regarding standard arrays and buffers, in C/C++ they don't embed their sizes and it is your responsibility as the programmer to keep track of the sizes
- This is true for any array, including strings, as well as all types of buffers
- If a function takes an array or a buffer as parameter, its size should probably also be passed
- You should also be aware of the sizes of different types on the architecture you target to avoid overflows
- sizeof() gives you that information
- Unsigned overflow wraps over to 0 but signed overflow is undefined
- In case of doubt, use wider type for arithmetics, check for overflow and if the result is indeed within bounds convert back to smaller type

---

# The C standard Library

- Check man pages

- **Never use these functions**, always unsafe:
  - gets (use fgets)
  - getwd (use getcwd)
  - readdir_r (use readdir)
  - More here: [https://bit.ly/3ef4TBc](https://bit.ly/3ef4TBc)

???

- Concerning the C standard library
- It's good to check out the man pages for its functions
- Some allocate results with malloc and it may be your responsibility to free the corresponding space
- Also, never use these functions, they are almost always unsafe

---
# String Manipulation Functions

- Use the *n* methods,
  - `strcpy` -> `strncpy`
  - `sprintf` -> `snprintf`
  - etc.
- Even with the versions with *n*, some particularities
  - `strncpy` won't add `\0` at the end of the target buffer

```c
char string1[] = "hello, world";
char string2[32] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";

strncpy(string2, string1, strlen(string1));
printf("%s\n", string2); // prints "hello, worldxxxxxxxxxxxxxxxxxxx"
```
.codelink[<a href="src/strncpy-bug.c" download>`24-good-practices/strncpy-bug.c`</a> <a href="https://github.com/olivierpierre/comp26020-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em" ></a><a href="https://www.programiz.com/online-compiler/81UjclCtxRP3d" target="_blank" style="text-decoration: none"><img src="include/programiz-logo.png" style="height: 1em"></a>]

???

- Regarding string manipulation functions,
- Generally you should use the versions including *n*, that not only check for
  \0 to find the end of a string but also allow a programmer-defined character limit
- Use strncpy rather than strcpy, snprintf rather than sprintf, and so on
- Even with the functions with *n* methods, be careful about some particularities
- For example strncpy does not ensure that the target buffer is terminated by
  `\0`

---
# Dynamic Memory Allocation

- Check `malloc` return value
- After free, the pointer is invalid
  - Cannot be dereferenced
  - Cannot be **used** (ex: comparison)
- `realloc` returns null upon failure but does not free the old pointer
  - so this: `ptr = realloc(ptr, new_size)` is a leak

???

- Finally, regarding dynamic memory allocation, a few things to keep in mind
- Don't forget to check malloc return value to avoid NULL pointer usage
- Remember that after free, the pointer is invalid
- Not only it cannot be dereferenced, that's relatively obvious
- But even its value itself, the previously pointed address, cannot be used
  anymore for example for comparison with other addresses
- And a last thing: when it fails, realloc returns null but does not free the old pointer
- so this is wrong: `ptr = realloc(ptr, new_size)`

---
# Summary & How to Learn More

- Compile-time errors, easily reproducible runtime crashes and easily detectable program behaviour
  divergence
  - **Nice bugs**, you can detect them
- Undefined behaviour:
  - **Nasty bugs**, you can fail to notice them and it can lead to serious vulnerabilities
- Solution: write good code + some tools can help
- How to learn more:
  - https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/index.html
  - Secure Coding in C and C++, 2nd edition by Robert C. Seacord

???
- So let's recap
- In C and C++ we have nice bugs, such as compile-time errors, easily
  reproducible runtime crashes and easily detectable program behaviour
  divergence
- These bugs are nice because we can detect them
- And then we have the nasty bugs, i.e. undefined behaviour:
- Many classes of memory errors
- You can fail to notice them and it can lead to serious vulnerabilities
- Solution: write good code + some tools can help
- How to learn more:
  - https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/index.html
  - Secure Coding in C and C++, 2nd edition by Robert C. Seacord

---
# Feedback form
.center[https://bit.ly/3CBOipk]
<div style="text-align:center"><img src="include/qr-code.png" height=150 /></div>