class: center, middle ### Secure Computer Architecture and Systems *** # OS Security Concepts --- background-image: url(include/survey.png) background-size: 100% --- # Introduction - OS = foundation of computer systems - Manages hardware, applications, users - Security is critical for reliability and trust -- - **Security goals: ** - Enforce what **subjects** (e.g. applications, users) can perform what **operations** (e.g. read, write) on what **objects** (e.g. files, system resources) - Try to apply the principle of least privilege and maintain confidentiality, integrity, and availability - Often at odds with other goals of OSes: performance, convenience of usage --- # Basic Process-level Security Invariants 1. **Inter-process isolation: processes cannot access (R/W/E) each other's state (mostly memory) directly** - Enforced with page tables, by giving each process its own address space -- 2. **User/kernel isolation:** - **Processes cannot access (R/W/E) the kernel's memory directly** - Enforced with the user/supervisor execution mode and associated bits in page table entries - **Processes can only invoke the kernel at a safe entry point** - Enabled with system calls, security enforced with user/supervisor protection --- # Basic User-level Security Invariants - **User authentication: only authorised users can access the system** - Enforced with some form of authentication mechanism, e.g. password, fingerprint, face ID -- - **Users can configure how to share/not to share the resources they own with other users** - Implemented with file permissions - Files abstract many type of system resources with UNIX/Linux -- - **Only privileged users (administrator) can accomplish security critical tasks** - Such as loading kernel code, shutting down the computer, mounting filesystems, etc. --- # Basic Trust Model .leftlargecol[
] .rightsmallcol[ - The entirety of the kernel is trusted - Local and remote applications and users are not trusted ] --- # Basic Trust Model .leftlargecol[
] .rightsmallcol[ - Hardware is also assumed to behave correctly ] --- # Basic Trust Model .leftlargecol[
] .rightsmallcol[ - System administrator has ambient authority - Certain apps (e.g. to log a user in or change passwords) are privileged ] --- # Basic Trust Model .leftlargecol[
] .rightsmallcol[ - BIOS, bootloader, boot process are also trusted ] --- # Basic Trust Model .leftlargecol[
] .rightsmallcol[ .center[ **Does this model reflect the reality?** ] ] --- # Realistic Trust Model .leftlargecol[
] .rightsmallcol[ - BIOS/bootloader have bugs, could be corrupted - Local attacker could switch the on-disk kernel image to a malicious one ] --- # Realistic Trust Model .leftlargecol[
] .rightsmallcol[ - The kernel has bugs - Particularly problematic in third-party software e.g. drivers ] --- # Realistic Trust Model .leftlargecol[
] .rightsmallcol[ - The hardware has vulnerabilities - Cf. side channels such as Spectre and Meltdown ] --- # Realistic Trust Model .leftlargecol[
] .rightsmallcol[ The sysadmin/computer owner may not even be trusted in certain scenarios! ] --- # Apps/Kernel Isolation .leftlargecol[
] .rightsmallcol[ For now let's assume a basic trust model and zoom in on process-level isolation ] --- # Process-level Isolation 1. **Processes should be isolated from each other** - No memory access, among other interference -- 2. Enforced by the kernel, as a result **the kernel should be isolated from applications** - Same thing: no memory access/no interference - But applications still need to be able to invoke the kernel -- .leftcol[
] .rightcol[ - Processes can't access each other's memory directly - But a process could exploit a bug in the kernel to interfere with another process ] --- # User -> Kernel Attacks - Kernel is written in a memory unsafe language - It is subject to all the memory safety/UB vulnerabilities we saw previously - General purpose kernels are huge, millions of lines of code - Cannot guarantee the absence of bugs (kernel's sheer size rather almost guarantees their presence!) .leftcol[
] .rightcol[ - Main attack surface from user space applications: **system call interface** - **Very large** trust interface, hard to secure ] --- # Hardening The System Call Interface - Linux considers **every piece of data flowing from user space through system calls as untrusted** - System calls parameters and, for pointer parameters, what they point to -- - Must assume user space may send: - Corrupted data structures - Bad indexing information - NULL pointers - Reference to resources (e.g. files) process does not have permission to access - Sequences of system call invocations in the wrong order - Etc. -- - Requires that the kernel **checks** for the validity of this data/control flow - Hard to get all checks right due to the complexity of the interface --- # System Calls: Pointer Parameters - A process often passes a parameter to some of its memory to the kernel - E.g. `readv`/`writev` system calls - Checking the validity of the data they point to necessitates special treatment .leftcol[
] --- # System Calls: Pointer Parameters - A process often passes a parameter to some of its memory to the kernel - E.g. `readv`/`writev` system calls - Checking the validity of the data they point to necessitates special treatment .leftcol[
] .rightcol[ - User space can use another thread to corrupt pointed data after the check - **Time of check to time of use** (TOCTTOU) AKA **double fetch** vulnerability ] --- # System Calls: Pointer Parameters - Only solution to protect against TOCTTOU: kernel must **copy** into kernel space all user space data passed by reference - Validity checks are performed on these copies
```c unsigned long __copy_from_user(void * to, const void __user * from, unsigned long n); unsigned long __copy_to_user(void __user * to, const void * from, unsigned long n); ``` --- # User -> Kernel Attacks Exploiting a kernel vulnerability an attacker can aim to: - **Leak/tamper with kernel memory** - E.g. to read kernel pointers and break kernel ASLR, or to escalate privilege by gaining administrator rights -- - **Access other processes memory** - The kernel has access to the entirety of the computer's memory -- - **Execute code**, possibly arbitrarily, in the context of the kernel (with full privileges) - E.g. to install and hide malicious programs (rootkits) -- - **DoS** the system or other applications - Etc. --- # Kernel Vulnerabilities - Paper* studying 1Y of kernel vulnerabilities (141) in 2010, define classes of vulnerabilities: - Missing pointer checks - Missing permission checks - Buffer overflow - Integer overflow - Uninitialised data - Memory mismanagement (leaks, use-after-free, double free) - Misc: NULL dereference, divide by zero, infinite loop, race/deadlock .small[*Chen et al., **Linux kernel vulnerabilities: State-of-the-art defences and open problems***] --- # Kernel Vulnerabilities
.center[Vulnerabilities vs. their impact] --- # Kernel Vulnerabilities
.center[Vulnerabilities locations in the kernel code base] --- class: middle, center, inverse # Linux: Runtime Defences --- # Attack Surface Reduction - **Strict kernel memory permissions** - Kernel executable code and read-only data must not be writable - Kernel function pointers and sensitive variables must not be writable -- - Segregate kernel and user space memory: - **Supervisor Mode Execution Prevention** prevents kernel from executing code located in user-space memory - **Supervisor Mode Access Prevention (SMAP)** prevents kernel from reading/writing user-space memory (temporarily disabled by the kernel during `copy_to`/`from_user`) - To protect against injection and dereference of user space pointers in the kernel (ret2usr attack) - User space can still partially control what's in the physmap (ret2dir attack) --- # Attack Surface Reduction .leftlargecol[ - **Reduce application's access to system calls: system call filtering** - Blacklisting the types of syscalls that are not supposed to be called legitimately by an application - Widely used in production to harden multi-tenant and sensitive environments: Docker containers, Android, Flatpak/Appimage, etc. - Achieved with **seccomp** under Linux - Problem: **how to come up with precise, per-application system call black/whitelists?** ] .rightsmallcol[
] --- # Probabilistic Defences - **Kernel stack canaries** - **Kernel Address Space Layout Randomisation (KASLR)** -- - These are not a panacea - 1 canary value for all stack frames on each CPU, can be leaked with e.g. a stack buffer overread - KASRL is coarse grained, random offset applied on the main kernel memory area - Kernel and modules executable code, kernel stacks, vmalloc, physmap, etc. - 1 single pointer leak may allow attacker to break ASLR for the entire area --- # Memory Integrity .leftlargecol[ - Use **shadow stacks** rather than canaries - Prevent over/underflows outside the stack with **guard pages** - Unmapped memory pages that will fault when hit - **Sanity-check heap free list for corruption** upon allocation/free - **Trap on integer overflows** (e.g. counters, size variables) ] .rightsmallcol[
] --- # Prevent Kernel InfoLeaks - **Avoid exposing kernel pointers to user space** - A single pointer leak can break KASLR - Care taken not to send partially/un-initialised data structures or buffers to user space - Kernel log or files containing pointers should be readable only by the administrator - **Don't use addresses as resource identifiers** (e.g. file descriptors), use atomic counters - **Poison/zero out memory released** to counter reuse attacks (reading uninitialised memory, use after free, etc.) - Can impact performance --- class: middle, center, inverse # Linux: Bug Detection --- # Dynamic Analysis Techniques - **Sanitisers:** KASAN, KUBSAN, leak and concurrency (race condition) sanitisers, etc. - **Lockdep:** tracks the state of locks to detect deadlocks, double locking, lock order inversion - **Dynamic tracing/instrumentation tools:** ftrace, perf, eBPF - **Fuzzing:** Syzkaller --- # Syzkaller .leftcol[ - Widely used kernel fuzzer - Injects badly-formed system calls to kernel space with the hope of triggering bugs - Kernel under test runs in a VM, fuzzing controlled from the host - Works best with KASAN and coverage enabled to guide the fuzzing process - For each round of fuzzing a program (a few syscalls) is generated and executed ] .rightcol[
] --- # Syzkaller: Syzlang - Syzkaller: grammar-based, well aware of the interface it fuzzes - **Syzlang** grammar used to describe system calls and the data flow - Syscalls arguments and their type - Values passed between system calls (e.g. `fd` created by `open()` can be used by `read()`) - Length parameter specifying the size of other parameters -- .leftcol[ - Good knowledge of the syscall interface let Syzkaller optimise fuzzing program generation/mutation ] .rightcol[ ```xml resource fd[int32]: 0xffffffff, AT_FDCWD resource sock[fd] resource sock_unix[sock] socket(...) sock accept(fd sock, ...) sock listen(fd sock, backlog int32) ``` ] --- # Syzkaller: Syzbot - **Syzbot:** continuous fuzzing of Linux, Android, FreeBSD, NetBSD, OpenBSD, and gVisor on 25 Syzkaller instances (~150 - 200 VMs) total - Reported thousands of bugs to the Linux kernel mailing lists - https://syzkaller.appspot.com - Problem: huge amount (two third) of invalid bugs reported
--- # Linux Test Project .leftcol[ - Repository containing thousands of test cases for the kernel - https://github.com/linux-test-project/ltp/tree/master - Examples of categories: - Syscalls - POSIX conformance - Filesystem - Networking - Memory management - Reproducing existing CVEs - Etc. ] .rightcol[
] --- # Static Analysis Techniques - **Pattern-based analysis** - Coccinelle, Smatch: describe programming mistakes patterns - E.g. for each call to `kmalloc` there should be a `kfree` -- - **Control and data flow analysis tools** - Sparse uses compiler attributes to indicate certain properties on objects and memory - E.g. `__user` to mark pointers to user space, `__acquires` for a lock held on function exit but not entry, etc. -- - Others: verification techniques, symbolic execution, compiler techniques -- - Some downsides (scalability/state explosion) of static analysis approaches are exacerbated on OS kernels due to the size of their code base - E.g. Linux with 20M+ LoC --- # Summary - **Basic OS security invariants:** - Processes cannot access each other's memory - Processes cannot access the memory of the kernel - Old trust model: the entire kernel, hardware, sysadmin and privilege apps, BIOS/bootloader/boot process are fully trusted - **Does not reflect modern environments** - System call interface is the main vector of user -> kernel attacks - We saw some kernel-level runtime defences: attack surface reduction, ASLR, etc. - And we saw some bug detection techniques: fuzzing, testing, static analysis