Secure Architectures and Systems

### Secure Computer Architecture and Systems
***
# Runtime Defences

???

- Hi everyone, here we are going to talk about defences running at runtime in production

---

# Non-Executable Memory

- Large parts of (if not all) the address space (e.g. the stack) used to be accessible with execution rights
  - Made code injection attacks very easy, e.g. through a buffer overflow on the stack

???

- Back in the days a large part of the address space use to be accessible with execution right
- That was really awful from the security point of view: indeed it meant that an attacker could write malicious machine code in there, for example through an overflow, and then have the CPU jumps to it
- That is called a code injection attack

--
- Hardware support for **setting parts of the address space as non-executable** appeared in the early 2000s (NX bit)
  - Set everything that is not code (stack, heap, etc.) as non-executable

???

- In the early 2000s hardware support appeared for setting part of the address space as non-executable
- It was used to set everything that should not be accessed i executable mode as non-executable: the stack, heap, static data sections, and so on
- This is an application of the principle of least privilege, and it made code injection attacks much less likely

.rightcol[
- Today modern programs aim to maintain **W⊕X**:
  - No part of memory can be writable and executable at the same time
]

???

- Today modern systems software aim to enforce the write xor execute principle for each memory area
- That principle states that you cannot have an area of memory be both writable and executable at the same time

---
# Address Space Layout Randomisation

- ASLR randomises the layout of the address space
- **Makes it hard for an attacker to determine target locations in memory**

???
- Another defence that is present in almost every system today is address space layout randomisation
- With ASLR each invocation of the program will have a different layout for the address space, in other words code and variable won't be at the same location in memory for subsequent invocations
- The goal is to make it harder for the attacker to determine where is what in the address space
- Recall from the attacks we have seen that many of them require us to know exactly where a buffer to overflow is present or the stack, or exactly where we need to jump in the code segment
- This is achieved by observing one invocation of the program, for example with a debugger
- And then starting the attack upon a second invocation of the program
- With ASLR that won't work anymore because the locations of the data and code we determine with the first invocation are not the same for the second.

--
- **Granularity** on modern systems: randomises the start address of segments when they are loaded
  - At program initialisation time for the main program
  - When they are loaded for libraries/shared objects

???

- Please note that the granularity is coarse-grained, for performance reasons we cannot really randomize the location of each variable independently
- So randomization is realised at the level of program's segments, as illustrated here
- It is realised at load time for the main program, and for dynamic libraries it's also realised when they are loaded

---
# ASLR (2)

int main() {
  int local1 = 24;
  int local2 = 25;

int *heap_ptr1 = malloc(sizeof(int));
  int *heap_ptr2 = malloc(sizeof(int));

printf("data addr 1: %p\n", &global1);
  printf("data addr 2: %p\n", &global2);
  printf("stack addr 1: %p\n", &local1);
  printf("stack addr 2: %p\n", &local2);
  printf("heap addr 1: %p\n", heap_ptr1);
  printf("heap addr 2: %p\n", heap_ptr2);

free(heap_ptr1);
  free(heap_ptr2);
}
```
.codelink[<a href="src/aslr.c" download>`11-runtime-defences/aslr.c`</a>  <a href="https://github.com/olivierpierre/comp60261-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em"></a>]

]

.rightcol[
- Location of data changes among executions
- Relative difference between the location of 2 pieces of data in different segments changes
- **It is not the case for 2 pieces of data in the same segment!**
  - A single pointer leak can allow the attacker to break ASLR for the entire segment
]

???

- To understand the security implications of coarse-grained ASLR check out this example program
- It simply prints the addresses of two local variables, 2 global variables, and 2 heap pointers
- As you can see the relative distance between 2 variables located in a different segment is randomized accross different executions of the same program
- However, the relative distance between 2 variables belonging to the same segment stays the same accross executions
- That is before only the base address of each segment is randomized at load time
- What that means is that if the attacker can leak the value of a single pointer, it is easy for them to compute the address of all other data or code within the containing segment
- Hence the coarse grained nature of ASLR on modern system makes it easy to break

---
name: canary

# Stack Canaries

- **Return address protection on the stack**: place a magic value (canary) before the return address upon function call and check it upon return

???

- The stack canary is a technique to protect the return address on the stack

---
template: canary

???

- The key idea is to place a magic value named the canary right before the return address in a callee's stack frame
- And to compare the canary's value to a ground truth when the callee's return

---
template: canary

???

- The compiler generates code to push the canary on the stack upon function call
- here the canary's value is 0x1234

---
template: canary

.rightcol[
- Compiler inserts code to place the canary upon function call
- And code to check its value upon return
]

???

- When the callee returns, the canary's value is checked to make sure it is still 0x1234

---
template: canary

.rightcol[
- Compiler inserts code to place the canary upon function call
- And code to check its value upon return
- An overflow overwrites the canary and the check would fail: overflow detected
]

???

- The key idea is that an overflow aiming at rewriting the return address would also overwrite the canary
- And the check of its value against the ground truth will fail, which in effect detects the attack and stops the program

---
# Stack Canaries (2)

- By default with modern compilers only certain functions (declaring a `char` array > 8 bytes) are protected with canaries
- `-fstack-protector-strong` applies it to more functions (no size limit)
- `-fstack-protector-all` applies it to all functions
- Trade off security vs. code size increase/performance impact

???

- By default canaries will be applied only to certain functions
- You can use the `-fstrack-protector-strong` to apply it to a larger subset of your program's functions, and `-fstack-protector-all` to apply it to every function in your program
- More canaries will increase the security of your program, but also will increase performance and code size overheads

- **Canaries are not perfect**: the same canary is generally used for all function calls
  - If the canary's value leaks to the attacker, e.g. through a read overflow, the protection is broken

???

- Canaries are not a perfect protection
- With current implementations, the same canary value is used to protect all function calls
- This means that if the attacker can leak the canary value, for example if there is a an overflow in read mode on the stack, then the protection is broken for the entire program

---
## Other Common Hardening Techniques

- **Don't embed debug information, strip symbols** 
  - These are very helpful for an attacker to reverse engineer a binary
  - `strip <binary>`

???
- Other common protection techniques include stripping your program from symbols and debug information
- This makes reverse engineering your code, which is a crucial step in most attacks as you will see in the labs, much more difficult

- **Read-only relocations (RELRO)** protect against attacks using the shared library relocation system (Global Offset Table) to hijack a program's control flow
  - Partial RELRO (default) only part of the GOT is read-only
  - Full RELRO relocates everything at program load time, the GOT is read only
      - Significantly impacts program load time
      - Linker option, add these compiler flags: `-Wl,-z,relro,-z,now`

???
- Read only relocation is a protection techniques against attacks aiming at overwriting the relocation, which is the method used to resolve calls to shared libraries at runtime
- You can see these as a form of function pointers
- Partial and full read-only relocation sets part or all of the corresponding area of the address space read only
- Once again here you have a trade off to chose between security and performance overhead, as full RELRO will make load time much longer

---
## Other Common Hardening Techniques (2)

- **`_FORTIFY_SOURCE`** macro:
  - Enable some lightweight compile-time/runtime buffer overflow protection checks before sensitive functions e.g. `strcpy`, `strcat`, etc.
  - Can be enabled through the compiler invocation: `--D_FORTIFY_SOURCE=1`
  - Second level (`2`) available, more checks but may break the program

???

- The `FORTIFY_SOURCE` macro enables some buffer overflow protection checks before sensitive functions such as the string manipulation ones and memcpy
- There are 2 levels for it, the second one adding more checks but coming at the risk of breaking your program, so only use it if you can fully test that things run OK

- Check the level of hardening of a binary with `checksec`:

```bash
$ gcc -fstack-protector-strong -D_FORTIFY_SOURCE=2 -O2 -Wl,-z,relro -Wl,-z,now myapp.c -o myapp
$ strip myapp
$ checksec --file myapp
RELRO         STACK CANARY    NX            PIE            Symbols       FORTIFY 
Full RELRO    Canary found    NX enabled    PIE enabled    No Symbols    Yes   
```

???

- You can analyze a binary to check for the presence or absence of all the hardening techniques I mentioned with the `checksec` tool
- You have an example here
- We have seen all of these, PIE means position independent executable, it is a prerequisite for the binary to be compatible with ASLR

---
# Control Flow Integrity

- CFI ensures that the program control flow to follow legitimate paths only
  - Protects against control flow hijacking attacks (e.g. stack smashing)
- In practice focus on protecting jumps which target address is writable
  - **Forward edge CFI**: protect calls related to function pointers/C++ virtual tables
  - **Backward edge CFI**: protect function returns

???

- Let's talk a bit more in details about a last, more advanced, technique, named control flow integrity
- We have seen previously how control flow hijacking attacks force the program to take code paths that were not intended by the programmer
- CFI enforce that the code paths excecuted by the program conform to the CFG originally intended by the programmer
- CFI generally involves two protections
- Forward edge CFI, checking that function pointers and C++ virtual table always have a legitimate target
- And backward edge CFI, checking that return addresses also always have a legitimate target

---
# CFI: Forward Edge Protection

- When a function pointer/C++ virtual table entry is called, restrict the target to valid functions
  - **Coarse-grain**: can be the beginning of any function
  - **Fine-grain**: restrict to the only legitimate targets in the CFG
      - E.g. only the function addresses assigned to a function pointer

???

- Regarding forward edge protection, CFI enforce that when a function pointer or an entry in a C++ virtual table is called, the target should be a valid function
- What valid means depends on the implementation
- With coarse grain CFI, the protection will just check that the target is the beginning of a function
- With fine grain CFI, the rpotection will make sure that only the functions which addresses are assigned to the function pointer or virtual table in the code can be called

- Implementations:
  - In **software** with LLVM/clang:
```c
clang -g -fsanitize=cfi -flto -fvisibility=hidden prog.c -o prog
```
  - In **hardware** with Intel CET: mark valid targets with an `endbr64` instruction

???

- Clang has a software implementation of CFI, to enable it see the compiler flags you need to add here
- This will instruct the compiler to insert the necessary instrumentation for CFI checks
- Recent Intel processors also have CFI in hardware: there is a special instruction `endbranch64` that indicate valid targets for function pointers of virtual table member invocation

---
name: shadowstack
# CFI: Backward Edge Protection

- This is implemented with a **shadow stack**
  - Separate location used to store a copy of the return address upon function call
  - Return address on the stack check against the copy upon return

???

- Regardign backward edge protection, which protect the return address on the stack, this is achieved for CFI with what is called a shadow stack
- It's a separate stack that is that stores a copy of the return address upon function call
- When that callee returns, the return address to jump to is checked against the copy in the shadow stack: if they don't match, something fishy is going on

---
template: shadowstack

.leftlargecol[
<div style="text-align:center"><img src="include/shadow-stack-1.svg" width=430 /></div>
]

???

- Here is an illustration of the shadow stack
- We are running the code of a function f1, we have its frame on the stack, and the shadow stack which is empty for now

---

.leftlargecol[
<div style="text-align:center"><img src="include/shadow-stack-2.svg" width=430 /></div>
]

???

- When f1 calls f2, it pushes the return address on the stack normally, but also a copy of it on the shadow stack

---

.leftlargecol[
<div style="text-align:center"><img src="include/shadow-stack-3.svg" width=430 /></div>
]

???

- When f2 calls f3, the process is repeated

---

.leftlargecol[
<div style="text-align:center"><img src="include/shadow-stack-4.svg" width=430 /></div>
]

???

- Same thing when f3 calls f4

---

.leftlargecol[
<div style="text-align:center"><img src="include/shadow-stack-5.svg" width=430 /></div>
]

???

- When f4 returns in f3, the return address on the stack is checked against the corresponding entry in the shadow stack, if they match it's all good
- If there was an overflow, they would be different

---

.leftlargecol[
<div style="text-align:center"><img src="include/shadow-stack-6.svg" width=430 /></div>
]

???

- f3 returns in f2, we get another check

---

.leftlargecol[
<div style="text-align:center"><img src="include/shadow-stack-7.svg" width=430 /></div>
]

???

- and f2 returns in f1, another check
- Of course the shadow stack needs to be placed by the compiler at a location in memory that is very hard for an attacker to read or write

---

# Summary

- **Runtime defences**
  - Help detect bugs in production
  - Render the exploitation of certain vulnerabilities more difficult
  - Aim to limit the damage an attacker can do when an exploit succeeds
- These countermeasures run in production, so their performance overhead must be very low

???

- To conclude, we have seen defences that can be applied at runtime and that are commonly used in production to detect exploits, make them much harder to achieve, and limit the damage that an attacker can do with exploits that succeed
- Because these things run in production, their performance overhead needs to be low, generally just a few percents