Secure Architectures and Systems - OS Security Concepts Part 1

class: center, middle

### Secure Computer Architecture and Systems
***
# OS Security Concepts

---

background-image: url(include/survey.png)
background-size: 100%

---

# Introduction
- OS = foundation of computer systems  
- Manages hardware, applications, users  
- Security is critical for reliability and trust

--
- **Security goals: **
  - Enforce what **subjects** (e.g. applications, users) can perform what **operations** (e.g. read, write) on what **objects** (e.g. files, system resources)
  - Try to apply the principle of least privilege and maintain confidentiality, integrity, and availability
  - Often at odds with other goals of OSes: performance, convenience of usage

---
# Basic Process-level Security Invariants

1. **Inter-process isolation: processes cannot access (R/W/E) each other's state (mostly memory) directly**
  - Enforced with page tables, by giving each process its own address space
--

2. **User/kernel isolation:**
  - **Processes cannot access (R/W/E) the kernel's memory directly**
        - Enforced with the user/supervisor execution mode and associated bits in page table entries
  - **Processes can only invoke the kernel at a safe entry point**
        - Enabled with system calls, security enforced with user/supervisor protection

---
# Basic User-level Security Invariants

- **User authentication: only authorised users can access the system**
    - Enforced with some form of authentication mechanism, e.g. password, fingerprint, face ID
--
- **Users can configure how to share/not to share the resources they own with other users**
  - Implemented with file permissions
  - Files abstract many type of system resources with UNIX/Linux
--

- **Only privileged users (administrator) can accomplish security critical tasks**
  - Such as loading kernel code, shutting down the computer, mounting filesystems, etc.

---
# Basic Trust Model

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-1.svg" width=600 /></div>
]

.rightsmallcol[
- The entirety of the kernel is trusted
- Local and remote applications and users are not trusted
]

---
# Basic Trust Model

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-2.svg" width=600 /></div>
]

.rightsmallcol[
- Hardware is also assumed to behave correctly
]

---
# Basic Trust Model

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-3.svg" width=600 /></div>
]

.rightsmallcol[
- System administrator has ambient authority
- Certain apps (e.g. to log a user in or change passwords) are privileged
]

---
# Basic Trust Model

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-4.svg" width=600 /></div>
]

.rightsmallcol[
- BIOS, bootloader, boot process are also trusted
]

---
# Basic Trust Model

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-4.svg" width=600 /></div>
]

.rightsmallcol[
.center[
  **Does this model reflect the reality?**
]
]

---
# Realistic Trust Model

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-5.svg" width=600 /></div>
]

.rightsmallcol[
- BIOS/bootloader have bugs, could be corrupted
- Local attacker could switch the on-disk kernel image to a malicious one
]

---
# Realistic Trust Model

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-6.svg" width=600 /></div>
]

.rightsmallcol[
- The kernel has bugs
- Particularly problematic in third-party software e.g. drivers
]

---
# Realistic Trust Model

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-7.svg" width=600 /></div>
]

.rightsmallcol[
- The hardware has vulnerabilities
- Cf. side channels such as Spectre and Meltdown
]

---
# Realistic Trust Model

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-8.svg" width=600 /></div>
]

.rightsmallcol[
The sysadmin/computer owner may not even be trusted in certain scenarios!
]

---
# Apps/Kernel Isolation

.leftlargecol[
<div style="text-align:center"><img src="include/threat-landscape-9.svg" width=600 /></div>
]

.rightsmallcol[
For now let's assume a basic trust model and zoom in on process-level isolation
]

---
# Process-level Isolation

1. **Processes should be isolated from each other**
  - No memory access, among other interference
--

2. Enforced by the kernel, as a result **the kernel should be isolated from applications**
  - Same thing: no memory access/no interference
  - But applications still need to be able to invoke the kernel

.leftcol[
<div style="text-align:center"><img src="include/process-level-1.svg" width=250 /></div>
]

.rightcol[
- Processes can't access each other's memory directly
- But a process could exploit a bug in the kernel to interfere with another process
]

---
# User -> Kernel Attacks

- Kernel is written in a memory unsafe language
- It is subject to all the memory safety/UB vulnerabilities we saw previously
- General purpose kernels are huge, millions of lines of code
  - Cannot guarantee the absence of bugs (kernel's sheer size rather almost guarantees their presence!)

.leftcol[
<div style="text-align:center"><img src="include/process-level-2.svg" width=250 /></div>
]

.rightcol[
- Main attack surface from user space applications: **system call interface**
- **Very large** trust interface, hard to secure
]

---
# Hardening The System Call Interface

- Linux considers **every piece of data flowing from user space through system calls as untrusted**
- System calls parameters and, for pointer parameters, what they point to
--

- Must assume user space may send:
  - Corrupted data structures
  - Bad indexing information
  - NULL pointers
  - Reference to resources (e.g. files) process does not have permission to access
  - Sequences of system call invocations in the wrong order
  - Etc.

--
- Requires that the kernel **checks** for the validity of this data/control flow
  - Hard to get all checks right due to the complexity of the interface

---
# System Calls: Pointer Parameters

- A process often passes a parameter to some of its memory to the kernel
  - E.g. `readv`/`writev` system calls
- Checking the validity of the data they point to necessitates special treatment

.leftcol[
<div style="text-align:center"><img src="include/tocttou-1.svg" width=400 /></div>
]

---
# System Calls: Pointer Parameters

- A process often passes a parameter to some of its memory to the kernel
  - E.g. `readv`/`writev` system calls
- Checking the validity of the data they point to necessitates special treatment

.leftcol[
<div style="text-align:center"><img src="include/tocttou-2.svg" width=400 /></div>
]

.rightcol[
- User space can use another thread to corrupt pointed data after the check
- **Time of check to time of use** (TOCTTOU) AKA **double fetch** vulnerability
]

---
# System Calls: Pointer Parameters

- Only solution to protect against TOCTTOU: kernel must **copy** into kernel space all user space data passed by reference
  - Validity checks are performed on these copies

```c
unsigned long __copy_from_user(void * to, const void __user * from, unsigned long n);
unsigned long __copy_to_user(void __user * to, const void * from, unsigned long n);
```

---
# User -> Kernel Attacks

Exploiting a kernel vulnerability an attacker can aim to:
- **Leak/tamper with kernel memory**
  - E.g. to read kernel pointers and break kernel ASLR, or to escalate privilege by gaining administrator rights
--

- **Access other processes memory**
  - The kernel has access to the entirety of the computer's memory
--

- **Execute code**, possibly arbitrarily, in the context of the kernel (with full privileges)
  - E.g. to install and hide malicious programs (rootkits)

--
- **DoS** the system or other applications
- Etc.

---
# Kernel Vulnerabilities

- Paper* studying 1Y of kernel vulnerabilities (141) in 2010, define classes of vulnerabilities:
  - Missing pointer checks
  - Missing permission checks
  - Buffer overflow
  - Integer overflow
  - Uninitialised data
  - Memory mismanagement (leaks, use-after-free, double free)
  - Misc: NULL dereference, divide by zero, infinite loop, race/deadlock

.small[*Chen et al., **Linux kernel vulnerabilities: State-of-the-art defences and open problems***]

---
# Kernel Vulnerabilities

.center[Vulnerabilities vs. their impact]

---
# Kernel Vulnerabilities

.center[Vulnerabilities locations in the kernel code base]

---
class: middle, center, inverse

# Linux: Runtime Defences

---
# Attack Surface Reduction

- **Strict kernel memory permissions**
  - Kernel executable code and read-only data must not be writable
  - Kernel function pointers and sensitive variables must not be writable
--

- Segregate kernel and user space memory:
      - **Supervisor Mode Execution Prevention** prevents kernel from executing code located in user-space memory
      - **Supervisor Mode Access Prevention (SMAP)** prevents kernel from reading/writing user-space memory (temporarily disabled by the kernel during `copy_to`/`from_user`)
      - To protect against injection and dereference of user space pointers in the kernel (ret2usr attack)
      - User space can still partially control what's in the physmap (ret2dir attack)

---
# Attack Surface Reduction

.leftlargecol[
- **Reduce application's access to system calls: system call filtering**
  - Blacklisting the types of syscalls that are not supposed to be called legitimately by an application
  - Widely used in production to harden multi-tenant and sensitive environments: Docker containers, Android, Flatpak/Appimage, etc.
- Achieved with **seccomp** under Linux
- Problem: **how to come up with precise, per-application system call black/whitelists?**
]

.rightsmallcol[
<div style="text-align:center"><img src="include/syscall-filtering.svg" width=200 /></div>
]

---
# Probabilistic Defences

- **Kernel stack canaries**
- **Kernel Address Space Layout Randomisation (KASLR)**
--

- These are not a panacea
  - 1 canary value for all stack frames on each CPU, can be leaked with e.g. a stack buffer overread
  - KASRL is coarse grained, random offset applied on the main kernel memory area
      - Kernel and modules executable code, kernel stacks, vmalloc, physmap, etc.
      - 1 single pointer leak may allow attacker to break ASLR for the entire area

---
# Memory Integrity

.leftlargecol[
- Use **shadow stacks** rather than canaries
- Prevent over/underflows outside the stack with **guard pages**
  - Unmapped memory pages that will fault when hit
- **Sanity-check heap free list for corruption** upon allocation/free
- **Trap on integer overflows** (e.g. counters, size variables)
]

.rightsmallcol[
<div style="text-align:center"><img src="include/guard-page.svg" width=200 /></div>
]

---
# Prevent Kernel InfoLeaks

- **Avoid exposing kernel pointers to user space**
  - A single pointer leak can break KASLR
  - Care taken not to send partially/un-initialised data structures or buffers to user space
  - Kernel log or files containing pointers should be readable only by the administrator
- **Don't use addresses as resource identifiers** (e.g. file descriptors), use atomic counters
- **Poison/zero out memory released** to counter reuse attacks (reading uninitialised memory, use after free, etc.)
  - Can impact performance

---
class: middle, center, inverse

# Linux: Bug Detection

---
# Dynamic Analysis Techniques

- **Sanitisers:** KASAN, KUBSAN, leak and concurrency (race condition) sanitisers, etc.
- **Lockdep:** tracks the state of locks to detect deadlocks, double locking, lock order inversion
- **Dynamic tracing/instrumentation tools:** ftrace, perf, eBPF
- **Fuzzing:** Syzkaller

---
# Syzkaller

.leftcol[
- Widely used kernel fuzzer
- Injects badly-formed system calls to kernel space with the hope of triggering bugs
- Kernel under test runs in a VM, fuzzing controlled from the host
- Works best with KASAN and coverage enabled to guide the fuzzing process
- For each round of fuzzing a program (a few syscalls) is generated and executed
]

.rightcol[
<div style="text-align:center"><img src="include/syzkaller.svg" width=340 /></div>
]

---
# Syzkaller: Syzlang

- Syzkaller: grammar-based, well aware of the interface it fuzzes
- **Syzlang** grammar used to describe system calls and the data flow
  - Syscalls arguments and their type
  - Values passed between system calls (e.g. `fd` created by `open()` can be used by `read()`)
  - Length parameter specifying the size of other parameters

.leftcol[
- Good knowledge of the syscall interface let Syzkaller optimise fuzzing program generation/mutation
]

.rightcol[
```xml
resource fd[int32]: 0xffffffff, AT_FDCWD
resource sock[fd]
resource sock_unix[sock]

socket(...) sock
accept(fd sock, ...) sock
listen(fd sock, backlog int32)
```
]

---
# Syzkaller: Syzbot

- **Syzbot:** continuous fuzzing of Linux, Android, FreeBSD, NetBSD, OpenBSD, and gVisor on 25 Syzkaller instances (~150 - 200 VMs) total
- Reported thousands of bugs to the Linux kernel mailing lists
- https://syzkaller.appspot.com
- Problem: huge amount (two third) of invalid bugs reported

---
# Linux Test Project

.leftcol[
- Repository containing thousands of test cases for the kernel
- https://github.com/linux-test-project/ltp/tree/master
- Examples of categories:
  - Syscalls
  - POSIX conformance
  - Filesystem
  - Networking
  - Memory management
  - Reproducing existing CVEs
  - Etc.
]

.rightcol[
<div style="text-align:center"><img src="include/ltp.png" width=300 /></div>
]

---
# Static Analysis Techniques

- **Pattern-based analysis**
  - Coccinelle, Smatch: describe programming mistakes patterns
      - E.g. for each call to `kmalloc` there should be a `kfree`
--
- **Control and data flow analysis tools**
  - Sparse uses compiler attributes to indicate certain properties on objects and memory
      - E.g. `__user` to mark pointers to user space, `__acquires` for a lock held on function exit but not entry, etc.
--
- Others: verification techniques, symbolic execution, compiler techniques
--

- Some downsides (scalability/state explosion) of static analysis approaches are exacerbated on OS kernels due to the size of their code base
  - E.g. Linux with 20M+ LoC

---
# Summary

- **Basic OS security invariants:**
  - Processes cannot access each other's memory
  - Processes cannot access the memory of the kernel
- Old trust model: the entire kernel, hardware, sysadmin and privilege apps, BIOS/bootloader/boot process are fully trusted
  - **Does not reflect modern environments**
- System call interface is the main vector of user -> kernel attacks
- We saw some kernel-level runtime defences: attack surface reduction, ASLR, etc.
- And we saw some bug detection techniques: fuzzing, testing, static analysis