COMP26020p1 - Secure Coding Practices

class: center, middle

### COMP26020 Programming Languages and Paradigms Part 1: C Programming
***
# Secure Coding Practices

???
- Hi everyone
- In this video we will review methods to try to minimise the amount of bugs we introduce when writing systems software

---
# Context

- The vast majority of systems software is written in **memory unsafe languages**
- Opens up for a wide range of **vulnerabilities** that can be exploited to **compromise their confidentiality, integrity, and availability**

???
- We've seen that memory safety and undefined behaviour issues are common in systems software, and that they lead to security vulnerabilities that can be exploited by attackers to do bad things

--
- What can we do about it?
  1. Adhere to **good coding practices** to avoid introducing bugs/vulnerabilities as much as possible
  2. Use tools and techniques to **detect bugs** before shipping systems software
  3. Use tools and techniques in production to **make exploitation harder/limit the damage stemming from exploits**

???
- What can we do about it?
- Three things
- First, when we develop we need to adhere to good coding practices to minimise as much as possible the changes of introducing such bugs
- Second, we have techniques that can help analyse our code during development and detect some bugs
- Third, we also have techniques that can help protect our programs in production, making exploitation more difficult and limiting the damage from successful exploits

Here we'll give a brief overview of 1, and the next 2 slide decks will cover 2 and 3

???

- This video focuses on the first two points, and we'll see the third next

---

# Good Practices

- Memory errors and other undefined behaviour stem from programming mistakes
  - E.g. improper sanitisation of untrusted input
- These issues can have dire consequences in terms of security
- **How to avoid such errors when developing systems software in C/C++?**

???
- As we have seen the memory errors and other sources of undefined behaviour that lead to security vulnerabilities come from programming mistakes
- How to avoid as much as possible introducing these programming mistakes when developing systems software?
- Let's see a few good coding practices to achieve that
  
---
# Array/Buffer/Integer Overflows

- **Array/buffers**: keep track of the sizes of arrays and buffers
  - To know when to stop iterating, how many bytes max to copy, etc.

???
- To prevent array buffer overflows, in C they do not embed their sizes so make sure to keep track of the size of each array/buffer you use
- You need to know the size of an array to know when to stop iterating, and the size of a destination buffer to know how many bytes you can copy in there

--
- **Integer arithemtics**: be aware of type sizes on the architecture you target to avoid overflows
  - Use `sizeof()`
  - `signed` integer overflow is undefined behaviour
  - Functions available to detect overflows: https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html

???
- When manipulating integers, make sure to be aware of the size reserved by the compiler to hold them in memory according to the architecture you are compiling for
- You can use sizeof to determine these sizes
- While an unsigned integer will never overflow but rather wrap around, overflowing a signed integer leads to undefined behaviour and must be avoided
- The compiler has some builtin functions that can tell you if an integer operation overflows, you can check these out at the URL present on the slide, these operation are available for integer addition, subtraction, and multiplication

---
# C Standard Library Functions

.small[

Example of functions to (try to) avoid:

| Unsafe Function | Why It Is Unsafe | Safe Alternative(s) |
|------------------|------------------|----------------------|
| `gets()`         | No bounds checking; allows buffer overflows | `fgets()` |
| `strcpy()`       | No bounds checking; can overflow destination buffer | `strncpy()`, `strlcpy()` (if available) |
| `sprintf()`      | No bounds checking; leads to buffer overflows | `snprintf()` |
| `scanf()`        | No bounds checking e.g., `%s` with no width | `fgets()` + `sscanf()` with width specifiers |
| `memcpy()`       | No bounds checking; can cause overflows | Use with care; consider `memmove()` for overlapping memory |
| `bcopy()`        | Obsolete; unsafe due to no bounds checking | `memmove()` |
| `strlen()`       | Not inherently unsafe, but must not be used on untrusted or unterminated buffers | Ensure string is null-terminated before use |

More functions: https://docs.fedoraproject.org/en-US/defensive-coding/programming-languages/C/
]

???

- Here are a few functions of the Libc which use should be avoided as much as possible
- You can see the reason why they are unsafe, as well as safe alternatives
- We have seen the strcpy and friends, that does not check for overflow on the buffers they write
- You should use the safe versions as much as possible, which all have a way to indicate the size of the receiving buffer to avoid overflows
- To move memory you should not rely on `bcopy` but rather use `memcpy` with care if the source and target areas do not overlap, and memmove if they do
- Finally, be careful with strlen, it can return numbers larger than the size of a string if that string is not properly terminated
- These are not the only functions to avoid, see the link on the slide for more

---
# Libc: String Manipulation Functions

- Use the *n* methods,
  - `strcpy` -> `strncpy`
  - `sprintf` -> `snprintf`
  - etc.

???
- So once again, regarding string manipulation functions, make sure to use the `n` versions that force you to indicate a maximum number of characters to process

--
- Even with the versions with *n*, some particularities
  - `strncpy` won't add `\0` at the end of the target buffer

```c
char string1[] = "hello, world";
char string2[32] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";

strncpy(string2, string1, strlen(string1));
printf("%s\n", string2); // prints "hello, worldxxxxxxxxxxxxxxxxxxx"
//
```
.codelink[<a href="src/strncpy.c" download>`24-secure-coding-practices/strncpy.c`</a>  <a href="https://github.com/olivierpierre/comp26020-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em"></a>]

???
- These are not perfect though, for example strncpy won't add the termination character at the end of the target buffer,
- Check out this code in which we wish to replace string2 that is composed of 32 x's with hello world
- Because strncpy is not adding the termination character, we end up with a mix of both strings which is probably not what the programmer intended

---
# Dynamic Memory Allocation

- Check `malloc` return value

???
- When using dynamic memory allocation, make sure to always check malloc's return value, for reasons we have previously discussed

--
- After free, the pointer is invalid
  - Cannot be dereferenced
  - Cannot be **used** (ex: comparison)

???
- Remember that after free is called upon a pointer, that pointer is invalid and should not be reused in any way
- It should obviously not be dereferenced, but its value should not be used for anything else too

--
- `realloc` returns null upon failure but does not free the old pointer
  - so this: `ptr = realloc(ptr, new_size)` is a leak

???
- be careful with `realloc`: this function will return NULL upon failure so make sure it does not overwrite the original pointer to the buffer you want to increase the size, otherwise you'll get a leak
- The example you see on the slide is an instance of such leak, if realloc fails, which is always a possibilty, `ptr` will be overwritten with `NULL`, and if you don't have another pointer referencing the buffer ptr was pointing to, you will never be able to call free on that buffer

--
- Use `calloc` (malloc zeroed out memory) if performance requirements allow it

???

- `malloc` does not zero out memory returned to allocation request, so if you initialise only partially a data structure located in a dynamically allocate buffer, and you pass that data structure to a context that you do not trust (for example by sending it through the network), you may be leaking memory content to that untrusted party
- So for buffers sent to untrusted context it is better to use `calloc` which will zero out memory allocated, at the cost of a performance slowdown

---
# Secure Coding: Further Readings

There is a plethora of secure coding guides and standards:

- SEI CERT C Coding Standard: https://wiki.sei.cmu.edu/confluence/display/c
- Robert C. Seacord, *Secure coding in C and C++* (book)
- ISO/IEC TS 17961 (C Secure Coding Rules): https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1624.pdf
- NASA JPL C Coding Standard: https://yurichev.com/mirrors/C/JPL_Coding_Standard_C.pdf
- Fedora's Defensive Coding Guide: https://docs.fedoraproject.org/en-US/defensive-coding/

???

- What we saw is just a few examples of secure coding practices, and we do not have the time to cover them all exhaustively
- You can see on the slide a list of good resources, make sure to check them out if you want to learn more

---

# Summary

- We covered a few good coding pratices to avoid introducing vulnerabilities in systems software
  - Taking care to avoid overflows
  - Using the more secure versions of some libc functions
  - Avoiding common mistakes with dynamic memory allocation
- These do not prevent human errors, mistakes cannot be ruled out
  - Need automated tools to detect programming errors

----
.leftcol[
.center[Feedback form: https://bit.ly/3WkoRkU]
]
.rightcol[
<div style="text-align:center"><img src="include/qr-code.png" height=150 /></div>
]

???
- To conclude, we a few coding good practices to avoid as much as possible introducing bugs that may lead to security issues in C programs
- They relate to being careful when manipulating buffers, arrays, and integers to avoid overflows
- Using the secure version of some libc function like the string manipulation routines
- And trying to avoid some common mistakes when working with dynamic memory allocation
- Of course applying these practices is not really sufficient for preventing 100% of the bugs we may introduce
- After all as programmers we are human and we can't assume every line of code we write is bug free
- So we need more automated tools to detect programming mistakes
- That is what we'll cover next