COMP26020p1 - Secure Coding Practices, Detecting Bugs

class: center, middle

### COMP26020 Programming Languages and Paradigms Part 1: C Programming
***
# Secure Coding Practices, Detecting Bugs

???
- Hi everyone
- We previously saw how to adopt some good practices when writing code to avoid introducing bugs that could translate in security issues
- Here we are going to cover a complementary approach, which is the use of automated tools to detect programming mistakes and bugs into existing code bases

---
# Detecting Coding Mistakes

- Certain tools can help detect the coding errors leading to vulnerabilities
  - Cannot run in production (e.g. due to high overhead), executed in the build and testing phases
  - Often integrated within the CI/CD pipeline

???
- Here we will cover techniques that are slow to execute, or that make the application slow
- As a result they cannot run in production, and are rather used during development

--
- **2 main categories:**
  1. Static analysis approaches
  2. Dynamic analysis approaches

???
- These techniques fall within 2 main categories: static and dynamic analysis

---
# Static Analysis

- **Static analysis** searches for issues by **analysing the source code**, without running the program
  - Pros: good coverage, lends itself well to automation
  - Cons: false positives, limited amount of context available, scalability on large programs

???
- Static analysis tools scan the source code of the program for possible bugs without actually running the program
- The benefits of this approach is that it has good coverage, it goes over the entirety of the program's code, which lends itself well to automation
- In terms of downsides, static analysis generally suffers from false positives: it means it may identify issues in the code that actually do not represent programming mistakes or security vulnerabilities
- Because it does not run the program, the efficiency of static analysis suffers from a limited amount of context, for example most of the memory content is not determined until runtime
- Finally, some static analysis techniques are quite slow and do not scale well to the large code bases of certain systems software

--
- **Enable extra warnings** with compiler flags:
  - Additional warnings vs. default: `-Wall`
  - More warnings: `-Wextra`
  - Even more warnings (can be picky): `-pedantic`
  - More info: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

???

- A first thing you should do is enable high degrees of compiler warnings
- You can use, by increasing order of pickiness, `-Wall` to get additional warnings, `-Wextra` to get even more warnings, and `-pedantic` to add even even more warnings
- See the link on the slides for details about what warnings are added by each option

---
# Static Analysis (2)

- **Static analysis tools**:
   - Clang Static Analyser: https://clang-analyzer.llvm.org/
   - Lint: https://docs.oracle.com/cd/E19205-01/820-4180/man1/lint.1.html
   - Coverity: https://scan.coverity.com/
   - cppcheck: https://cppcheck.sourceforge.io/

???
- There are more advanced static analysis tools, you have a few examples here: the clang static analysis, Lint, Coverity, or Cppcheck

Let's check out an example with the Clang static analyser.

???
- As an example let's see how we can use the clang static analyser

---
# Clang Static Analyser

.leftcol[
```c
int c;

int main() {

int a = INT_MAX;
    int b = 1;
    c = a + b; // Integer overflow!

char buffer[8];
    char str[] = "this string is too long";
    strcpy(buffer, str); // Buffer overflow!

int *ptr = (int *)malloc(sizeof(int));
    *ptr = 42;
    free(ptr);
    *ptr = 99; // Use-after-free!

return 0;
}
```
.codelink[<a href="src/faulty.c" download>`25-detecting-bugs/faulty.c`</a>  <a href="https://github.com/olivierpierre/comp26020-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em"></a>]
]

???

- We have a faulty program here, it contains 3 bugs
- The first bug is an integer overflow, we add in c 1 to the largest integer that can be stored on an int
- The second bug is a buffer overflow, we copy in in buffer -- which size is 8 bytes -- a string that is larger than 8 bytes
- And the last bug is a use after free, where we dereference the pointer ptr after having freed the buffer it points to

.rightcol[
```bash
$ gcc faulty.c -o faulty
$ ./faulty
```

- No warning/error at compile-time!
- No visible effect at runtime!
]

???

- Notice that with the default level of warnings, this program compiles fine, and also it runs without any visible error

---
# Clang Static Analyser (2)

```bash
$ clang --analyze faulty.c
faulty.c:22:10: warning: Use of memory after it is freed [unix.Malloc]
    *ptr = 99; // Use-after-free!
    ~~~~ ^
1 warning generated.
```

- Clang static analyser detects he use-after-free...
- ... but not the integer and buffer overflows
  - For that we need **dynamic analysis tools**

???

- if we launch clang's static analyser on our source code, we can see that it is able to detect the use after free bug
- However it does not detect the two other bugs
- For that we need to use dynamic analysis

---
# Dynamic Analysis

- **Dynamic analysis**: tries to detect errors **while running the program**
  - Pros: runtime context available, easier if sources unavailable (black box testing)
  - Cons: input-dependant coverage, scalability to many programs, high runtime overheads

???
- Dynamic analysis tries to detect errors while running the program
- Doing so it gets access to more information than static analysis, that is runtime information
- It's also useful when the sources of the program we wish to analyse are not available

--
- **Compiler-based instrumentation** is a highly popular dynamic analysis approach: **sanitisers**
  - **AddressSanitizer (ASan)**: detects heap/stack/globals memory issues
      - Buffer overflows, use after free, double free,
      - Memory leaks
  - **UndefinedBehaviorSanitizer (UBSan)**: integer overflows, invalid casts, misaligned pointers, division by 0
  - A few more: https://en.wikipedia.org/wiki/Code_sanitizer

???
- A very popular type of dynamic analysis is achieved through compiler based instrumentation
- These are called sanitisers
- The most widespread is address sanitiser, that will detect a wide range of memory errors that would not be caught at compile time or at runtime without the instrumentation
- You also have the undefined behaviour sanitiser that detects things like integer overflows, invalid casts, and so on
- Check out the link on the slide for more information about sanitisers

---

# ASan and UBSan

- Enabling ASan and UBSan on the previously-seen faulty program

```bash
# Compile with ASan enabled:
$ clang -fsanitize=address faulty.c -o faulty
$ ./faulty
=================================================================
==21543==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffcc881f268 # ...
```

???

- We can enable address sanitiser instrumentation on our faulty program as follows
- As you can see it first catch the buffer overflow

```bash
# Enable ASan again, after having fixed the buffer overflow:
clang -fsanitize=address faulty.c -o faulty
$ ./faulty
./faulty                                   
=================================================================
==22504==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000010 # ...
```

???
- Once we have fixed that overflow we can recompile the program, still with address sanitiser, and launch it again
- This time we can see that it detects the use after free

```bash
# Compile with UBSan enabled:
$ clang -fsanitize=undefined faulty.c -o faulty
$ ./faulty
faulty.c:12:11: runtime error: signed integer overflow:
    2147483647 + 1 cannot be represented in type 'int'
```

???
- And finally, when we enable undefined behaviour sanitiser, and launch the program, the integer overflow is detected

---
# Valgrind

- Older dynamic analysis tool to detect memory errors and leaks among other things
  - Mostly **superseded by sanitisers** for this task
- In addition to leaks Valgrind can detect certain memory errors
- Sanitisers detect leaks too
- Still, no **need to recompile with Valgrind**

???

- You have other dynamic analysis tools
- Most of what came before the sanitisers has been rendered more or less obsolete by them
- We have seen Valgrind previously, in addition to reporting about memory leaks, it can also detect certain memory errors
- Given that sanitisers also detect memory leaks, that makes Valgrind quite redundant
- However, note that with Valgrind there is no need to recompile the program to insert instrumentation, as one do with the sanitisers
- So Valgrind is still useful in context where we have only access to the application's binary and not its sources

---
# Fuzzing AKA Fuzz Testing

- **Fuzzing: injecting malformed input through a trust boundary to trigger bugs**
  - E.g. command line arguments, input files, network
  - A form of dynamic analysis
  - Highly popular modern approach helping to secure interfaces

???
- One last dynamic analysis technique is fuzzing
- It consist in blasting a trust boundary with malformed inputs with the hope to trigger bugs
- Examples of trust boundaries that are good candidates for fuzzing include the command line arguments, input files, network packets, and so on
- Fuzzing is highly popular these days, and has help uncover a very large number of bugs in many projects

Let's see an example with the tool American Fuzzy Lop (AFL): https://github.com/google/AFL

???

- Let's briefly see an example with the AFL fuzzer

---
# Fuzzing (2)

Vulnerable program:

```c
int main(int argc, char *argv[]) {
    char name[32];  // Vulnerable buffer (too small for unchecked input)

if (argc < 2) {
        printf("Usage: %s <input file>\n", argv[0]);
        return 1;
    }

FILE *f = fopen(argv[1], "r");
    if (!f) {
        printf("Error, can't open %s\n", argv[1]);
        return 1;
    }

fread(name, 1, 512, f);  // Reads up to 512 bytes into a 32-byte buffer!
    fclose(f);

printf("hello %s\n", name);
    return 0;
}
```
.codelink[<a href="src/fuzzme.c" download>`25-detecting-bugs/fuzzme.c`</a>  <a href="https://github.com/olivierpierre/comp26020-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em"></a>]

???

- This is a vulnerable program
- It opens a file and reads its content into a buffer
- It reads 512 bytes, however the destination buffer is only 32 bytes long, so there is a possibility of overflow here
- The name of the file to read comes from the command line

---
# Fuzzing (3)

```bash
# Install AFL
$ sudo apt install afl # or afl++ on very recent ubuntu/debian distributions

# Compile and instrument target program
$ afl-clang fuzzme.c -o fuzzme

# Create a "seed" input to kickstart the fuzzing process
$ mkdir input
$ echo "testname" > input/seed

# Start the fuzzing process (after a few seconds stop it with ctrl+c)
$ AFL_SKIP_CPUFREQ=1 afl-fuzz -i input -o output -- ./fuzzme @@

# Reproduce the bug found by AFL (payload file may have a different name on your machine)
$ clang -g -fsanitize=address fuzzme.c -o fuzzme
$ ./fuzzme output/crashes/id:000000,sig:11,src:000000,op:havoc,rep:128
```

???

- You have instructions on the slide for how to fuzz this program
- You can pause the video to study them in details

.small[
There is a lot more to say about fuzzing. Further readings:

- Sutton et al., **Fuzzing: Brute Force Vulnerability Discovery**
- **The Fuzzing Book**: https://www.fuzzingbook.org/
- Fuzzing 101: https://github.com/antonio-morales/Fuzzing101

]

???

- One last thing about fuzzing: it's a huge field and you probably will hear more details about it in other units
- On the slide you can find a few pointers if you want to dig deeper.

---

## Other Static and Dynamic Analysis Approaches

- Widespread approaches: manual code review, unit testing
- Linters/style checkers
- Taint analysis
- Symbolic execution, abstract interpretation
- Formal verification, model checking

???

- There are a few other static and dynamic analyses techniques that can be used
- You are probably familiar with unit testing and manual code reviews, as well as with tools to check that your code follows a certain style
- There are other techniques such as formal verification, that will be covered in other units so I won't talk about them here

---

# Summary

- We covered various static/dynamic tools to detect bugs during testing phase
- Unfortunately, **none of these approaches can guarantee the absence of bugs**. 
- We also need runtime defences in production
  - To detect bugs
  - To make exploits harder to achieve and limit the damage an attacker can do when exploiting vulnerabilities

----

.leftcol[
.center[Feedback form: https://bit.ly/4oDLf4S]
]
.rightcol[
<div style="text-align:center"><img src="include/qr-code.png" height=150 /></div>
]

???

- To conclude, we covered the use of various automated tools to try to detect bugs in existing code
- They belong in two main categories, static and dynamic analysis approaches
- Unfortunately, even if you combine these with the secure coding practices we saw previously, none of these approaches will allow you to get rid of 100% of the bugs and vulnerabilities
- We also need defences executing at runtime in production, to make exploits harder and limit their damage