Secure Architectures and Systems

class: center, middle

### Secure Computer Architecture and Systems
***
# Memory Management

???

- Hi everyone, here we are going to cover another key responsibility of operating systems
- Memory management

---
# Memory Management

- **Memory management** is the set of OS features managing allocation and accesses to memory
  - Memory allocation/deallocation for applications and for the kernel
  - Setting up and maintaining address spaces for processes and for the kernel
  - Enforcing memory protection (isolation) between processes and the kernel, and in between processes
  - Swapping
  - Etc.

???

- It corresponds to the set of features implemented by the operating system to manage memory allocation, protection, and accesses
- Things like allocation and deallocation of memory for applications and for the kernel
- Address spaces management
- Isolating the memory belonging to different processes, and isolating the kernel's memory from access by processes
- Swapping, which is the use of secondary storage as main memory
- And more

---

# Virtual Memory

.leftcol[
- CPU accesses memory with load and store instructions
]

.rightcol[
<div style="text-align:center"><img src="include/mem-1.svg" width=300 /></div>
]

???

- You are probably already aware of the concept of virtual memory, so just a quick refresher here
- The CPU accesses memory with load and store instructions

---

# Virtual Memory

.leftcol[
- CPU accesses memory with load and store instructions
- At boot time the OS enable virtual memory: every load/store now hits a virtual address
  - MMU translates **transparently** to physical addresses
]

.rightcol[
<div style="text-align:center"><img src="include/mem-2.svg" width=300 /></div>
]

???

- In the vast majority of modern systems at boot time the CPU enables virtual very memory early during the boot process
- From that moment all the addresses targeted by loads and stores will be virtual, the CPU won't address physical memory directly anymore
- The translation between the virtual addresses the CPU request to access and the actual physical memory they correspond to is done transparently by the MMU

---

# Virtual Memory

.rightcol[
<div style="text-align:center"><img src="include/mem-3.svg" width=300 /></div>
]

???

- An old implementation of virtual memory is segmentation
- An app will get access to a relatively small virtual address space, a segment, which size is a subset of the total RAM size
- That address space is then mapped to physical memory contiguously: in effect the address translation just consists in adding an offset to a virtual address to obtain the corresponding physical one

---

# Virtual Memory

.rightcol[
<div style="text-align:center"><img src="include/mem-4.svg" width=300 /></div>
]

???

- Each process would get its own segment, mapped at different locations in physical memory to ensure isolation between processes
- Overall segmentation was not very flexible and brought issues such as fragmentation

---

# Virtual Memory

.rightcol[
<div style="text-align:center"><img src="include/mem-5.svg" width=300 /></div>
]

???

- Rather than segmentation, the vast majority of modern systems use paging to implement virtual memory
- With paging, almost the entirety of the space addressable given the width of the memory address bus is accessible to make up each process address space
- For example most Intel 64 bits CPUs virtual addresses have a size of 48 bits, which gives a virtual address space of 256 TB for each process
- Of course most of that address space is not mapped to physical memory
- With paging the mapping is achieved at the granularity of 4KB pages
- A data structure called the page table defines what virtual pages are mapped to physical memory, and where to

---

# Virtual Memory

.rightcol[
<div style="text-align:center"><img src="include/mem-6.svg" width=300 /></div>
]

???

- Each process has a different page table, and without establishing shared memory processes do not share physical pages
- This way they are fully isolated from each other

---
# Paging

- Virtual memory mapped to physical memory at the granularity of a **page** (4KB)
- Address translation for 1 process defined by its **page table**
  - Indicates what virtual pages are mapped to what physical pages
  - 1 page table == 1 address space, so there is 1 per process

???
- As I was saying with paging virtual memory is mapped to physical memory at the granularity of 4KB pages
- The page table indicates what virtual pages are mapped to what physical pages
- There is one page table, defining one address space, per process in the system

--
- Page tables are
  1. **Set up/controlled by the OS**
  2. **Walked transparently by the MMU** when the CPU performs loads and stores

???
- Concretely, the way a page table handling the address space of a process work is as follows
1. The OS sets up the page table when the process is created. The OS also maintains the page table when new mappings need to be added/removed, for example when the process loads a shared library or allocates memory.
2. A page table installed is walked transparently by the MMU to achieve the translation when the CPU runs the process in question and accesses memory.

---
# Page Tables

- Using a linear array with one entry per virtual page would be highly inefficient
  - 64 bit virtual address space is very sparse, most pages are not mapped

???
- Intuitively you may think that the page table is a large linear array with a slot per virtual page and in each slot translation information regarding that page
- That would be a huge waste of memory because modern 64 bit address spaces are very large, there are many pages, but most of them are not mapped

--
- Instead, use a **tree** of pages holding page table data
  - 4 levels on most 64 bit modern CPUs

???
- Instead, the page table is a tree
- The tree is made of pointers linking together special pages in physical memory used for address translation
- Using a tree means that the system can avoid storing a lot of translation information corresponding to the large areas of the address space that are not mapped to physical memory
- On modern CPUs the page table generally has 4 levels of pointers, although we are starting to see some CPUs with 5

- Root address of the tree held in a register
  - To change address space during context switch: simply switch that register to the root of another tree

???

- The root address of the tree corresponding to the page table for the process currently executing is held in a control register
- So changing address space during a context switch is easy: simply write in that register the address of the root of the target page table

---
# Page Tables (2)
<div style="text-align:center"><img src="include/pt-1.svg" width=570 /></div>

???

- This illustrates a typical 4-level page table
- The address of the root page is held in a control register on the CPU, for Intel x86-64 it's %cr3
- The root node represents the 4th level of the page table, and the entries it contains reference pages of the 3rd level
- Entries at the 3rd level reference pages of the second level, and their entries reference pages of the first level
- Finally, entries at the first level reference the physical pages holding the data accessed by the CPU

---
# Page Tables (2)
<div style="text-align:center"><img src="include/pt-2.svg" width=570 /></div>

???

- As I mentioned, each translation page contains pointers to the next level
- The size of a page is 4KB, so there is enough space for 512 pointers
- Each pointer may be either present, meaning it corresponds to a range of virtual address space that is mapped, and its value refers to a page at the lower level
- Or absent, meaning it corresponds to a range of the virtual address space that is not mapped
- Note that all pointers in translation pages refer to physical addresses

---
# Page Tables (2)
<div style="text-align:center"><img src="include/pt-3.svg" width=570 /></div>

???

- When a page table is installed and the CPU issues loads and store, the page table is walked transparently by the MMU to perform the translation
- For example if the CPU issues a load at address x, the MMU follows the path of pointers indexed by x, and the data read by this load operation will be the bit hit in the data page at the end of the walk

---
# Page Tables Walk
<div style="text-align:center"><img src="include/walk1.svg" width=420 /></div>

???

- Let's see how the entries in the page table are indexed during a page table walk
- When the CPU issues a load or a store it targets a particular virtual address
- We have an example of 64 bits virtual address depicted in binary on the slide

---
# Page Tables Walk
<div style="text-align:center"><img src="include/walk2.svg" width=420 /></div>

???

- As mentioned with x86-64 the %cr3 control registers holds the address of the root of the page table, which is the 4th level

---
# Page Tables Walk
<div style="text-align:center"><img src="include/walk3.svg" width=420 /></div>

???

- The bits 39 to 47 of the target virtual address are used to index an entry in the root page, which gives us the pointer to the 3rd level translation page

---
# Page Tables Walk
<div style="text-align:center"><img src="include/walk4.svg" width=420 /></div>

???

- Then the bits 30 to 38 of the target virtual address index an entry in the 3rd level translation page, which gives us the pointer to the 2nd level page

---
# Page Tables Walk
<div style="text-align:center"><img src="include/walk5.svg" width=420 /></div>

???

- The bits 21 to 29 of the virtual address index an entry in the 2nd level translation page, giving the pointer to the 1st level page

---
# Page Tables Walk
<div style="text-align:center"><img src="include/walk6.svg" width=420 /></div>

???

- The bits 12 to 20 of the virtual address index an entry in the 1st level translation page, giving the pointer to the physical page the CPU wants to access

---
# Page Tables Walk
<div style="text-align:center"><img src="include/walk7.svg" width=420 /></div>

???

- And finally the last bits of the virtual address, bits 0 to 11, index a byte within that physical page

---
# Page Table Entries (x86-64)

- Each page of the page table holds 512 64-bit entries
- Lower levels referenced by their page index in the 48-bit physical address space supported

???
- As I mentioned each page contains 512 entries, each of size 64 bits
- We don't need the full 64 bits of an entry to index the lower levels though
- First the virtual address space on most 64 bits processor is rather indexed on 48 bits
- Second, the pointers in the translation page do not refer exactly to physical addresses, but rather to physical page index
- Obviously there are less physical pages than there are physical addresses, so we need less bit to index pages
- Overall we only need 36 bits for each entry, meaning that we can use the additional bits to hold metadata about the range of the virtual address space referenced by each entry in translation pages

- Also contain **metadata** regarding the memory referenced:
  - Present, read/write, user/supervisor
  - Allow to control if accesses to the memory referenced will succeed or fault (exception)
  - Used to implement **memory protection**, but also CoW address space transfer, swap, etc.

???

- This metadata is used to indicate if the range of address space concerned is actually mapped, if it is accessible in read and/or write mode, and if it is only accessible in supervisor mode or in user mode
- This allows to control memory accesses: if an address in the virtual address space is not present or if the access in question is denied, for example because it's read only and accessed in write mode, the CPU will trigger a page fault exception
- This is crucial to the security of the system, and it is used to implement memory protection, but also things like swap and the on-demand duplication of the address space upon fork

---
# The OS and the Address Space

- User/supervisor memory protection allows the kernel to be mapped in the address space of each process
  - **No need to switch page table upon system calls**

???

- With the bit in the page table entries allowing to set part of the address space as accessible in supervisor mode only, we can actually have the kernel live in the same address space as processes
- So with Linux, the kernel is mapped in the top part of the address space of each process as illustrated on the slides
- Every page table is configured so that this area is accessible in supervisor mode only, i.e. by the kernel only
- This has an important advantage: there is no need to switch page tables upon system calls
- Switching page table is very costly because it involves a flush of the translation cache, the translation lookaside buffer

--
- Mechanisms enforcing the main memory protection security invariants:
  - Applications isolated from each other by having **different page tables**
  - Kernel isolated from apps with **supervisor protection in each address space**

???
- To sum up, the mechanisms enforcing the main memory security invariants in the system are twofold
- Processes are isolated from each other by having different page tables defining different, non overlapping address space
- And the kernel is isolated from processes through the supervisor only access bit in each page table

---
# Kernel Address Space

- Some important areas:
  - **dirmap**: Direct linear mapping of all physical memory
  - **vmalloc area**: serve certain kernel dynamic memory allocation (`vmalloc`) requests
  - **Kernel code and static memory** (`.data`, `.bss`, etc.): similar to a traditional program
  - **Modules**: pieces of kernel code that can be loaded/unloaded dynamically without rebooting

???

- If we zoom in on the part of the address space reserved for the kernel, it is made of many different areas
- I don't have time to go over each area here, but here are the main ones
- You have the dirmap, which is a direct mapping of all physical memory
- It is useful when the kernel wants to access physical memory directly, for example when setting up page tables, or when allocating memory that needs to be contiguous in physical memory
- The vmalloc area is basically the kernel heap, containing memory allocated dynamically
- Like a standard program, the kernel also has a static memory part with its code and static data, these were mapped from the kernel's binary at boot time
- Finally, Linux supports the dynamic loading and unloading of kernel code at runtime, in the forms of kernel modules
- These are loaded in a specific area of the kernel part of the address space

---
# Memory Allocation in the Kernel

.leftlargecol[
When the kernel needs some memory the following needs to happen:

]

.rightsmallcol[
<div style="text-align:center"><img src="include/alloc-general-1.svg" width=200 /></div>
]

???

- When the kernel need to allocate memory for itself or for an application, the following needs to happen

---
# Memory Allocation in the Kernel

.leftlargecol[
When the kernel needs some memory the following needs to happen:
- Reserve some free physical memory
]

.rightsmallcol[
<div style="text-align:center"><img src="include/alloc-general-2.svg" width=200 /></div>
]

???

- The kernel first reserves some physical memory, enough space to satisfy the allocation request, let's assume here it's 2 pages, and they don't need to be contiguous in physical memory

---