Secure Architectures and Systems - Secure Coding Practices, Detecting Bugs

class: center, middle

### Secure Computer Architecture and Systems
***
# Secure Coding Practices, Detecting Bugs

???
- Hi everyone
- In this video we will review methods to try to minimise the amount of bugs we introduce when writing systems software

---
# Context

- The vast majority of systems software is written in **memory unsafe languages**
- Opens up for a wide range of **vulnerabilities** that can be exploited to **compromise their confidentiality, integrity, and availability**

???
- We've seen that memory safety and undefined behaviour issues are common in systems software, and that they lead to security vulnerabilities that can be exploited by attackers to do bad things

--
- What can we do about it?
  1. Adhere to **good coding practices** to avoid introducing bugs/vulnerabilities as much as possible
  2. Use tools and techniques to **detect bugs** before shipping systems software
  3. Use tools and techniques in production to **make exploitation harder/limit the damage stemming from exploits**

???
- What can we do about it?
- Three things
- First, when we develop we need to adhere to good coding practices to minimise as much as possible the changes of introducing such bugs
- Second, we have techniques that can help analyse our code during development and detect some bugs
- Third, we also have techniques that can help protect our programs in production, making exploitation more difficult and limiting the damage from successful exploits

This slides deck relates to 1 and 2, the next slide deck to 3.

???

- This video focuses on the first two points, and we'll see the third next

---

# Good Practices

- Memory errors and other undefined behaviour stem from programming mistakes
  - E.g. improper sanitisation of untrusted input
- These issues can have dire consequences in terms of security
- **How to avoid such errors when developing systems software in C/C++?**

???
- As we have seen the memory errors and other sources of undefined behaviour that lead to security vulnerabilities come from programming mistakes
- How to avoid as much as possible introducing these programming mistakes when developing systems software?

- Here we'll cover 2 aspects to this problem:
  1. **How to avoid introducing mistakes when writing C/C++ code**
  2. **How to detect errors present in existing code**

???
- There are two main aspects to this problem
- First, how to avoid introducing these programming mistakes when writing code
- Second, how to detect these programming mistakes in existing code
  
---
class: inverse, middle, center

# Secure C/C++ Coding Practices

???

- Let's first cover secure coding practices

---
# Array/Buffer/Integer Overflows

- **Array/buffers**: keep track of the sizes of arrays and buffers
  - To know when to stop iterating, how many bytes max to copy, etc.

???
- To prevent array buffer overflows, in C they do not embed their sizes so make sure to keep track of the size of each array/buffer you use
- You need to know the size of an array to know when to stop iterating, and the size of a destination buffer to know how many bytes you can copy in there

--
- **Integer arithemtics**: be aware of type sizes on the architecture you target to avoid overflows
  - Use `sizeof()`
  - `signed` integer overflow is undefined behaviour
  - Functions available to detect overflows: https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html

???
- When manipulating integers, make sure to be aware of the size reserved by the compiler to hold them in memory according to the architecture you are compiling for
- You can use sizeof to determine these sizes
- While an unsigned integer will never overflow but rather wrap around, overflowing a signed integer leads to undefined behaviour and must be avoided
- The compiler has some builtin functions that can tell you if an integer operation overflows, you can check these out at the URL present on the slide, these operation are available for integer addition, subtraction, and multiplication

---
# C Standard Library Functions

.small[

Example of functions to (try to) avoid:

| Unsafe Function | Why It Is Unsafe | Safe Alternative(s) |
|------------------|------------------|----------------------|
| `gets()`         | No bounds checking; allows buffer overflows | `fgets()` |
| `strcpy()`       | No bounds checking; can overflow destination buffer | `strncpy()`, `strlcpy()` (if available) |
| `sprintf()`      | No bounds checking; leads to buffer overflows | `snprintf()` |
| `scanf()`        | No bounds checking e.g., `%s` with no width | `fgets()` + `sscanf()` with width specifiers |
| `memcpy()`       | No bounds checking; can cause overflows | Use with care; consider `memmove()` for overlapping memory |
| `bcopy()`        | Obsolete; unsafe due to no bounds checking | `memmove()` |
| `strlen()`       | Not inherently unsafe, but must not be used on untrusted or unterminated buffers | Ensure string is null-terminated before use |

More functions: https://docs.fedoraproject.org/en-US/defensive-coding/programming-languages/C/
]

???

- Here are a few functions of the Libc which use should be avoided as much as possible
- You can see the reason why they are unsafe, as well as safe alternatives
- We have seen the strcpy and friends, that does not check for overflow on the buffers they write
- You should use the safe versions as much as possible, which all have a way to indicate the size of the receiving buffer to avoid overflows
- To move memory you should not rely on `bcopy` but rather use `memcpy` with care if the source and target areas do not overlap, and memmove if they do
- Finally, be careful with strlen, it can return numbers larger than the size of a string if that string is not properly terminated
- These are not the only functions to avoid, see the link on the slide for more

---
# Libc: String Manipulation Functions

- Use the *n* methods,
  - `strcpy` -> `strncpy`
  - `sprintf` -> `snprintf`
  - etc.

???
- So once again, regarding string manipulation functions, make sure to use the `n` versions that force you to indicate a maximum number of characters to process

--
- Even with the versions with *n*, some particularities
  - `strncpy` won't add `\0` at the end of the target buffer

```c
char string1[] = "hello, world";
char string2[32] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";

strncpy(string2, string1, strlen(string1));
printf("%s\n", string2); // prints "hello, worldxxxxxxxxxxxxxxxxxxx"
//
```
.codelink[<a href="src/strncpy.c" download>`10-secure-coding-practices-detecting-bugs/strncpy.c`</a>  <a href="https://github.com/olivierpierre/comp60261-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em"></a>]

???
- These are not perfect though, for example strncpy won't add the termination character at the end of the target buffer,
- Check out this code in which we wish to replace string2 that is composed of 32 x's with hello world
- Because strncpy is not adding the termination character, we end up with a mix of both strings which is probably not what the programmer intended

---
# Dynamic Memory Allocation

- Check `malloc` return value

???
- When using dynamic memory allocation, make sure to always check malloc's return value, for reasons we have previously discussed

--
- After free, the pointer is invalid
  - Cannot be dereferenced
  - Cannot be **used** (ex: comparison)

???
- Remember that after free is called upon a pointer, that pointer is invalid and should not be reused in any way
- It should obviously not be dereferenced, but its value should not be used for anything else too

--
- `realloc` returns null upon failure but does not free the old pointer
  - so this: `ptr = realloc(ptr, new_size)` is a leak

???
- As we have seen previously, `realloc` will return NULL upon failure so make sure it does not overwrite the original pointer to the buffer you want to increase the size, otherwise you'll get a leak

--
- Use `calloc` (malloc zeroed out memory) if performance requirements allow it

???

- `malloc` does not zero out memory returned to allocation request, so if you initialise only partially a data structure located in a dynamically allocate buffer, and you pass that data structure to a context that you do not trust (for example by sending it through the network), you may be leaking memory content to that untrusted party
- So for buffers sent to untrusted context it is better to use `calloc` which will zero out memory allocated, at the cost of a performance slowdown

---
# Secure Coding: Further Readings

There is a plethora of secure coding guides and standards:

- SEI CERT C Coding Standard: https://wiki.sei.cmu.edu/confluence/display/c
- Robert C. Seacord, *Secure coding in C and C++* (book)
- ISO/IEC TS 17961 (C Secure Coding Rules): https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1624.pdf
- NASA JPL C Coding Standard: https://yurichev.com/mirrors/C/JPL_Coding_Standard_C.pdf
- Fedora's Defensive Coding Guide: https://docs.fedoraproject.org/en-US/defensive-coding/

???

- What we saw is just a few examples of secure coding practices, and we do not have the time to cover them all exhaustively
- You can see on the slide a list of good resources, make sure to check them out if you want to learn more

---
class: inverse, middle, center

# Detecting Coding Mistakes in Development/Testing Phases

???

- Let's now see analysis tools that can help you detect programming mistakes in existing code

---
# Detecting Coding Mistakes

- Certain tools can help detect the coding errors leading to vulnerabilities
  - Cannot run in production (e.g. due to high overhead), executed in the build and testing phases
  - Often integrated within the CI/CD pipeline

???
- Here we will cover techniques that are slow to execute, or that make the application slow
- As a result they cannot run in production, and are rather used during development

--
- **2 main categories:**
  1. Static analysis approaches
  2. Dynamic analysis approaches

???
- These techniques fall within 2 main categories: static and dynamic analysis

---
# Static Analysis

- **Static analysis** searches for issues by **analysing the source code**, without running the program
  - Pros: good coverage, lends itself well to automation
  - Cons: false positives, limited amount of context available, scalability on large programs

???
- Static analysis tools scan the source code of the program for possible bugs without actually running the program
- The benefits of this approach is that it has good coverage, it goes over the entirety of the program's code, which lends itself well to automation
- In terms of downsides, static analysis generally suffers from false positives: it means it may identify issues in the code that actually do not represent programming mistakes or security vulnerabilities
- Because it does not run the program, the efficiency of static analysis suffers from a limited amount of context, for example most of the memory content is not determined until runtime
- Finally, some static analysis techniques are quite slow and do not scale well to the large code bases of certain systems software

--
- **Enable extra warnings** with compiler flags:
  - Additional warnings vs. default: `-Wall`
  - More warnings: `-Wextra`
  - Even more warnings (can be picky): `-pedantic`
  - More info: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

???

- A first thing you should do is enable high degrees of compiler warnings
- You can use, by increasing order of pickiness, `-Wall` to get additional warnings, `-Wextra` to get even more warnings, and `-pedantic` to add even even more warnings
- See the link on the slides for details about what warnings are added by each option

---
# Static Analysis (2)

- **Static analysis tools**:
   - Clang Static Analyser: https://clang-analyzer.llvm.org/
   - Lint: https://docs.oracle.com/cd/E19205-01/820-4180/man1/lint.1.html
   - Coverity: https://scan.coverity.com/
   - cppcheck: https://cppcheck.sourceforge.io/

???
- There are more advanced static analysis tools, you have a few examples here: the clang static analysis, Lint, Coverity, or Cppcheck

Let's check out an example with the Clang static analyser.

???
- As an example let's see how we can use the clang static analyser

---
# Clang Static Analyser

.leftcol[
```c
int c;

int main() {

int a = INT_MAX;
    int b = 1;
    c = a + b; // Integer overflow!

char buffer[8];
    char str[] = "this string is too long";
    strcpy(buffer, str); // Buffer overflow!

int *ptr = (int *)malloc(sizeof(int));
    *ptr = 42;
    free(ptr);
    *ptr = 99; // Use-after-free!

return 0;
}
```
.codelink[<a href="src/faulty.c" download>`10-secure-coding-practices-detecting-bugs/faulty.c`</a>  <a href="https://github.com/olivierpierre/comp60261-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em"></a>]
]

???

- We have a faulty program here, it contains 3 bugs
- The first bug is an integer overflow, we add in c 1 to the largest integer that can be stored on an int
- The second bug is a buffer overflow, we copy in in buffer -- which size is 8 bytes -- a string that is larger than 8 bytes
- And the last bug is a use after free, where we dereference the pointer ptr after having freed the buffer it points to

.rightcol[
```bash
$ gcc faulty.c -o faulty
$ ./faulty
```

- No warning/error at compile-time!
- No visible effect at runtime!
]

???

- Notice that with the default level of warnings, this program compiles fine, and also it runs without any visible error

---
# Clang Static Analyser (2)

```bash
$ clang --analyze faulty.c
faulty.c:22:10: warning: Use of memory after it is freed [unix.Malloc]
    *ptr = 99; // Use-after-free!
    ~~~~ ^
1 warning generated.
```

- Clang static analyser detects he use-after-free...
- ... but not the integer and buffer overflows
  - For that we need **dynamic analysis tools**

???

- if we launch clang's static analyser on our source code, we can see that it is able to detect the use after free bug
- However it does not detect the two other bugs
- For that we need to use dynamic analysis

---
# Dynamic Analysis

- **Dynamic analysis**: tries to detect errors **while running the program**
  - Pros: runtime context available, easier if sources unavailable (black box testing)
  - Cons: input-dependant coverage, scalability to many programs, high runtime overheads

???
- Dynamic analysis tries to detect errors while running the program
- Doing so it gets access to more information than static analysis, that is runtime information
- It's also useful when the sources of the program we wish to analyse are not available

--
- **Compiler-based instrumentation** is a highly popular dynamic analysis approach: **sanitisers**
  - **AddressSanitizer (ASan)**: detects heap/stack/globals memory issues
      - Buffer overflows, use after free, double free,
      - Memory leaks
  - **UndefinedBehaviorSanitizer (UBSan)**: integer overflows, invalid casts, misaligned pointers, division by 0
  - A few more: https://en.wikipedia.org/wiki/Code_sanitizer

???
- A very popular type of dynamic analysis is achieved through compiler based instrumentation
- These are called sanitisers
- The most widespread is address sanitiser, that will detect a wide range of memory errors that would not be caught at compile time or at runtime without the instrumentation
- You also have the undefined behaviour sanitiser that detects things like integer overflows, invalid casts, and so on
- Check out the link on the slide for more information about sanitisers

---

# ASan and UBSan

- Enabling ASan and UBSan on the previously-seen faulty program

```bash
# Compile with ASan enabled:
$ clang -fsanitize=address faulty.c -o faulty
$ ./faulty
=================================================================
==21543==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffcc881f268 # ...
```

???

- We can enable address sanitiser instrumentation on our faulty program as follows
- As you can see it first catch the buffer overflow

```bash
# Enable ASan again, after having fixed the buffer overflow:
clang -fsanitize=address faulty.c -o faulty
$ ./faulty
./faulty                                   
=================================================================
==22504==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000010 # ...
```

???
- Once we have fixed that overflow we can recompile the program, still with address sanitiser, and launch it again
- This time we can see that it detects the use after free

```bash
# Compile with UBSan enabled:
$ clang -fsanitize=undefined faulty.c -o faulty
$ ./faulty
faulty.c:12:11: runtime error: signed integer overflow:
    2147483647 + 1 cannot be represented in type 'int'
```

???
- And finally, when we enable undefined behaviour sanitiser, and launch the program, the integer overflow is detected

---
# Valgrind

- Older dynamic analysis tool to detect memory errors and leaks among other things
  - Mostly **superseded by sanitisers** for this task
- In addition to leaks Valgrind can detect certain memory errors
- Sanitisers detect leaks too
- Still, no **need to recompile with Valgrind**

???

- You have other dynamic analysis tools
- Most of what came before the sanitisers has been rendered more or less obsolete by them
- We have seen Valgrind previously, in addition to reporting about memory leaks, it can also detect certain memory errors
- Given that sanitisers also detect memory leaks, that makes Valgrind quite redundant
- However, note that with Valgrind there is no need to recompile the program to insert instrumentation, as one do with the sanitisers
- So Valgrind is still useful in context where we have only access to the application's binary and not its sources

---
# Fuzzing AKA Fuzz Testing

- **Fuzzing: injecting malformed input through a trust boundary to trigger bugs**
  - E.g. command line arguments, input files, network
  - A form of dynamic analysis
  - Highly popular modern approach helping to secure interfaces

???
- One last dynamic analysis technique is fuzzing
- It consist in blasting a trust boundary with malformed inputs with the hope to trigger bugs
- Examples of trust boundaries that are good candidates for fuzzing include the command line arguments, input files, network packets, and so on
- Fuzzing is highly popular these days, and has help uncover a very large number of bugs in many projects

Let's see an example with the tool American Fuzzy Lop (AFL): https://github.com/google/AFL

???

- Let's briefly see an example with the AFL fuzzer

---
# Fuzzing (2)

Vulnerable program:

```c
int main(int argc, char *argv[]) {
    char name[32];  // Vulnerable buffer (too small for unchecked input)

if (argc < 2) {
        printf("Usage: %s <input file>\n", argv[0]);
        return 1;
    }

FILE *f = fopen(argv[1], "r");
    if (!f) {
        printf("Error, can't open %s\n", argv[1]);
        return 1;
    }

fread(name, 1, 512, f);  // Reads up to 512 bytes into a 32-byte buffer!
    fclose(f);

printf("hello %s\n", name);
    return 0;
}
```
.codelink[<a href="src/fuzzme.c" download>`10-secure-coding-practices-detecting-bugs/fuzzme.c`</a>  <a href="https://github.com/olivierpierre/comp60261-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em"></a>]

???

- This is a vulnerable program
- It opens a file and reads its content into a buffer
- It reads 512 bytes, however the destination buffer is only 32 bytes long, so there is a possibility of overflow here
- The name of the file to read comes from the command line

---
# Fuzzing (3)

```bash
# Install AFL
$ sudo apt install afl # or afl++ on very recent ubuntu/debian distributions

# Compile and instrument target program
$ afl-clang fuzzme.c -o fuzzme

# Create a "seed" input to kickstart the fuzzing process
$ mkdir input
$ echo "testname" > input/seed

# Start the fuzzing process (after a few seconds stop it with ctrl+c)
$ AFL_SKIP_CPUFREQ=1 afl-fuzz -i input -o output -- ./fuzzme @@

# Reproduce the bug found by AFL (payload file may have a different name on your machine)
$ clang -g -fsanitize=address fuzzme.c -o fuzzme
$ ./fuzzme output/crashes/id:000000,sig:11,src:000000,op:havoc,rep:128
```

???

- You have instructions on the slide for how to fuzz this program
- You can pause the video to study them in details

.small[
There is a lot more to say about fuzzing. Further readings:

- Sutton et al., **Fuzzing: Brute Force Vulnerability Discovery**
- **The Fuzzing Book**: https://www.fuzzingbook.org/
- Fuzzing 101: https://github.com/antonio-morales/Fuzzing101

]

???

- One last thing about fuzzing: it's a huge field and you probably will hear more details about it in other units
- On the slide you can find a few pointers if you want to dig deeper.

---

## Other Static and Dynamic Analysis Approaches

- Widespread approaches: manual code review, unit testing
- Linters/style checkers
- Taint analysis
- Symbolic execution, abstract interpretation
- Formal verification, model checking

???

- There are a few other static and dynamic analyses techniques that can be used
- You are probably familiar with unit testing and manual code reviews, as well as with tools to check that your code follows a certain style
- There are other techniques such as formal verification, that will be covered in other units so I won't talk about them here

---

# Summary

- To avoid introducing vulnerabilities in systems software:
  1. Adhere to good coding practices
  2. Use static/dynamic tools to detect bugs during testing phase

- Unfortunately, **none of the existing practical approaches can guarantee the absence of bugs**
- We also need runtime defences in production
  - To detect bugs
  - To make exploits harder to achieve
  - To limit the damage an attacker can do when exploiting vulnerabilities

???

- To conclude, we covered 2 ways to introduce vulnerabilities when developing systems software
- First, adopting good coding practices
- Second, use static and dynamic analysis tools to try to detect bugs in existing code
- Unfortunately none of these approaches will allow you to get rid of 100% of the bugs and vulnerabilities
- We also need defences executing at runtime in production, to make exploits harder and limit their damage