class: center, middle ### Secure Computer Architecture and Systems *** # Anatomy of a Program in Memory --- name: vaspace # The Virtual Address Space - Each program sees the memory it can access as a very large array of bytes, **the address space** - Each slot in this array has an **address**, from 0 to ~128 TB on modern 64 bits CPUs - The program access memory with load and store instructions at target addresses - With virtual memory the address space is **private** to each program - Each program thinks it can read/write over the entire address space --- template:vaspace
--- template:vaspace
--- # The Virtual Address Space - The size of the virtual address space is very large, and independent from the amount of RAM the machine has - It is also **sparsely populated**
--- # Virtual Address Space Life Cycle - How does the address space evolve during the execution of a program? - Part of it is set up at load time, and part of it changes at runtime --- # Virtual Address Space Life Cycle
- Before the program is executed its binary sits somewhere on disk --- # Virtual Address Space Life Cycle
- When the program is invoked e.g. `./my-binary`, the operating system first creates a virtual address space --- # Virtual Address Space Life Cycle
- The program binary is in a particular format (ELF) that indicates what tool should be used to bootstrap the address space - This is the **loader**, e.g. `ld-linux-x86-64.so.2` - It's a separate binary - The OS first loads the loader by mapping parts of its binary in the address space - Check out ELF metadata e.g. with `readelf -l /bin/ls` --- # Virtual Address Space Life Cycle
- The loader then starts to executes --- # Virtual Address Space Life Cycle
- Loader loads the program's binary --- # Virtual Address Space Life Cycle
- Most programs are **dynamically** linked: they require additional libraries - Named shared libraries as they can be loaded in several programs' address spaces - E.g. C standard library (`libc.so`) - Loader fetches the list of libraries needed from the program's ELF binary and loads them in the address space - Check a program's dependency libraries with e.g. `ldd /bin/ls` --- # Virtual Address Space Life Cycle
- All of this is **static memory**: the size of these areas and the data they contains is fixed at compile time - These sizes never changes throughout the program's execution - Static memory includes **executable code** and **global variables** - At that stage the program's code has not started to run yet: when static memory is set up, the program can start to execute --- # Virtual Address Space Life Cycle
- As it runs the program will also need **dynamic memory**, i.e. areas which size will change at runtime - A **stack** that will hold **local variables** and **function arguments**, growing (down) with nested function calls --- # Virtual Address Space Life Cycle
- As it runs the program will also need **dynamic memory**, i.e. areas which size will change at runtime - A **stack** that will hold **local variables** and **function arguments**, growing (down) with nested function calls - A **heap** to hold memory allocated dynamically with `malloc`, growing up --- # Virtual Address Space Life Cycle
- Other mappings may be done later at runtime - E.g. for loading modules, or just-in-time compiled code (e.g. Java) --- # Virtual Address Space Life Cycle
- All these mappings have **access permissions** set up by the OS and enforced by the CPU on load and store operations - E.g. readable and executable permissions for **code**, readable and writable permissions for **data** --- # Memory Map of a Process ```bash $ cat /proc/21184/maps 564870330000-564870331000 r--p 00000000 103:04 35652325 /home/pierre/prog 564870331000-564870332000 r-xp 00001000 103:04 35652325 /home/pierre/prog 564870332000-564870333000 r--p 00002000 103:04 35652325 /home/pierre/prog 564870333000-564870334000 r--p 00002000 103:04 35652325 /home/pierre/prog 564870334000-564870335000 rw-p 00003000 103:04 35652325 /home/pierre/prog 56489d9a4000-56489d9c5000 rw-p 00000000 00:00 0 [heap] 7f6b9ec32000-7f6b9ec35000 rw-p 00000000 00:00 0 7f6b9ec35000-7f6b9ec5b000 r--p 00000000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 7f6b9ec5b000-7f6b9edb0000 r-xp 00026000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 7f6b9edb0000-7f6b9ee03000 r--p 0017b000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 7f6b9ee03000-7f6b9ee07000 r--p 001ce000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 7f6b9ee07000-7f6b9ee09000 rw-p 001d2000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 7f6b9ee09000-7f6b9ee16000 rw-p 00000000 00:00 0 7f6b9ee2e000-7f6b9ee30000 rw-p 00000000 00:00 0 7f6b9ee30000-7f6b9ee31000 r--p 00000000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 7f6b9ee31000-7f6b9ee56000 r-xp 00001000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 7f6b9ee56000-7f6b9ee60000 r--p 00026000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 7f6b9ee60000-7f6b9ee62000 r--p 00030000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 7f6b9ee62000-7f6b9ee64000 rw-p 00032000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 7ffd9db2f000-7ffd9db50000 rw-p 00000000 00:00 0 [stack] 7ffd9dba3000-7ffd9dba7000 r--p 00000000 00:00 0 [vvar] 7ffd9dba7000-7ffd9dba9000 r-xp 00000000 00:00 0 [vdso] ``` --- # Static Memory ```bash $ cat /proc/21184/maps *564870330000-564870331000 r--p 00000000 103:04 35652325 /home/pierre/prog *564870331000-564870332000 r-xp 00001000 103:04 35652325 /home/pierre/prog *564870332000-564870333000 r--p 00002000 103:04 35652325 /home/pierre/prog *564870333000-564870334000 r--p 00002000 103:04 35652325 /home/pierre/prog *564870334000-564870335000 rw-p 00003000 103:04 35652325 /home/pierre/prog 56489d9a4000-56489d9c5000 rw-p 00000000 00:00 0 [heap] 7f6b9ec32000-7f6b9ec35000 rw-p 00000000 00:00 0 *7f6b9ec35000-7f6b9ec5b000 r--p 00000000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 *7f6b9ec5b000-7f6b9edb0000 r-xp 00026000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 *7f6b9edb0000-7f6b9ee03000 r--p 0017b000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 *7f6b9ee03000-7f6b9ee07000 r--p 001ce000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 *7f6b9ee07000-7f6b9ee09000 rw-p 001d2000 103:04 27004964 /usr/lib/x86_64-linux-gnu/libc.so.6 7f6b9ee09000-7f6b9ee16000 rw-p 00000000 00:00 0 7f6b9ee2e000-7f6b9ee30000 rw-p 00000000 00:00 0 *7f6b9ee30000-7f6b9ee31000 r--p 00000000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 *7f6b9ee31000-7f6b9ee56000 r-xp 00001000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 *7f6b9ee56000-7f6b9ee60000 r--p 00026000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 *7f6b9ee60000-7f6b9ee62000 r--p 00030000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 *7f6b9ee62000-7f6b9ee64000 rw-p 00032000 103:04 27004961 /usr/.../ld-linux-x86-64.so.2 7ffd9db2f000-7ffd9db50000 rw-p 00000000 00:00 0 [stack] 7ffd9dba3000-7ffd9dba7000 r--p 00000000 00:00 0 [vvar] 7ffd9dba7000-7ffd9dba9000 r-xp 00000000 00:00 0 [vdso] ``` --- # Static Memory At load time:
- **Private mappings:** stores by the program in writable areas won't be reflected in the binary on disk --- # Inspecting a Binary ```bash $ readelf -lSW my-program There are 31 section headers, starting at offset 0x36f8: Section Headers: [Nr] Name Type Address Off Size ES Flg Lk Inf Al [15] .text PROGBITS 0000000000001070 001070 00011f 00 AX 0 0 16 [17] .rodata PROGBITS 0000000000002000 002000 000008 00 A 0 0 4 [25] .data PROGBITS 0000000000004018 003018 000010 00 WA 0 0 8 [26] .bss NOBITS 0000000000004028 003028 000008 00 WA 0 0 1 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000688 0x000688 R 0x1000 LOAD 0x001000 0x0000000000001000 0x0000000000001000 0x000199 0x000199 R E 0x1000 LOAD 0x002000 0x0000000000002000 0x0000000000002000 0x0000e0 0x0000e0 R 0x1000 LOAD 0x002dd0 0x0000000000003dd0 0x0000000000003dd0 0x000258 0x000260 RW 0x1000 Section to Segment mapping: Segment Sections... 03 .init .plt .plt.got .text .fini 04 .rodata .eh_frame_hdr .eh_frame 05 .init_array .fini_array .dynamic .got .got.plt .data .bss ``` --- # The Code Segment - Contains the machine code the CPU will execute when running the program - To see its content, disassemble: .leftcol[ ```c #include
#include
int main(int argc, char **argv) { while(1) { printf("%d\n", getpid()); sleep(1); } return 0; } ``` .codelink[
`05-program-memory/inspect-me.c`
] See how the compiler translated sources into machine code ] .rightcol[ ```bash $ objdump --disassemble my-program 0000000000001159
: 1159: push %rbp 115a: mov %rsp,%rbp 115d: sub $0x10,%rsp 1161: mov %edi,-0x4(%rbp) 1164: mov %rsi,-0x10(%rbp) 1168: call 1030
116d: mov %eax,%esi 116f: lea 0xe8e(%rip),%rax 1176: mov %rax,%rdi 1179: mov $0x0,%eax 117e: call 1040
1183: mov $0x1,%edi 1188: call 1050
118d: jmp 1168
``` ] --- # Function Calling Convention - The concept of a "function" does not exist at the machine code level - **How does the compiler transform function calls and returns present in the source into machine code?** -- - The compiler generates the machine code for function calls and returns according to an architecture-specific **calling convention**. For x86-64: 1. Place arguments in order in registers `%rdi`, `%rsi`, `%rdx`, `%rcx`, `%r8`, and `%r9` - More than 6 parameters? Additional ones pushed on the stack 2. Issue `call` instruction, CPU jumps to the called function 3. When it returns the called function places return value in the `%rax` register and invoke `ret` instruction -- This convention is **the System V x86-64 Application Binary Interface** (ABI) --- # Function Calling Convention .leftcol[ ```c #include
#include
int main(int argc, char **argv) { while(1) { printf("%d\n", getpid()); sleep(1); } return 0; } ``` .codelink[
`05-program-memory/inspect-me.c`
] ] .rightcol[ ```bash $ objdump --disassemble my-program 0000000000001159
: 1159: push %rbp 115a: mov %rsp,%rbp 115d: sub $0x10,%rsp 1161: mov %edi,-0x4(%rbp) 1164: mov %rsi,-0x10(%rbp) 1168: call 1030
116d: mov %eax,%esi 116f: lea 0xe8e(%rip),%rax 1176: mov %rax,%rdi 1179: mov $0x0,%eax 117e: call 1040
1183: mov $0x1,%edi 1188: call 1050
118d: jmp 1168
``` ] --- # Function Calls and the Stack .leftcol[ ```c void f(int param) { int f_local = 0xcafe; /* ... */ return; } int main() { int main_local = 0x42; int ret = f(main_local); /* ... */ } ``` .codelink[
`05-program-memory/stack.c`
] - `CALL` and `RET` instructions used to call/return from functions ] .rightcol[ ```bash 0000000000001129
: 1129: push %rbp 112a: mov %rsp,%rbp 112d: mov %edi,-0x14(%rbp) 1130: movl $0xcafe,-0x4(%rbp) 1137: nop 1138: pop %rbp * 1139: ret 000000000000113a
: 113a: push %rbp 113b: mov %rsp,%rbp 113e: sub $0x10,%rsp 1142: movl $0x42,-0x4(%rbp) 1149: mov -0x4(%rbp),%eax 114c: mov %eax,%edi * 114e: call 1129
1153: mov %eax,-0x8(%rbp) 1156: mov $0x0,%eax 115b: leave 115c: ret ``` ] --- # Function Calls and the Stack .leftcol[ ```c void f(int param) { int f_local = 0xcafe; /* ... */ return; } int main() { int main_local = 0x42; int ret = f(main_local); /* ... */ } ``` .codelink[
`05-program-memory/stack.c`
] - Stack is a contiguous area in the address space that holds per-function data, e.g. parameters and local variables - Each function has a **stack frame** ] .rightcol[ `main` runs, before calling `f`:
] --- # Function Calls and the Stack .leftcol[ ```c void f(int param) { int f_local = 0xcafe; /* ... */ return; } int main() { int main_local = 0x42; int ret = f(main_local); * /* ... <- return address here */ } ``` .codelink[
`05-program-memory/stack.c`
] - When the CPU executes the `CALL` instruction it: 1. Pushes the return address on the stack; and 2. Jumps to the target function `f` ] .rightcol[ `main` calls `f`:
] --- # Function Calls and the Stack .leftcol[ ```c void f(int param) { int f_local = 0xcafe; /* ... */ return; } int main() { int main_local = 0x42; int ret = f(main_local); /* ... */ } ``` .codelink[
`05-program-memory/stack.c`
] ] .rightcol[ `f` runs:
] --- # Function Calls and the Stack .leftcol[ ```c void f(int param) { int f_local = 0xcafe; /* ... */ return; } int main() { int main_local = 0x42; int ret = f(main_local); /* ... */ } ``` .codelink[
`05-program-memory/stack.c`
] - When the CPU executes the return `RET` instruction: - Return address popped from the stack - CPU jumps to it ] .rightcol[ `f` returns to `main`:
] --- # Summary - Code and data constituting a program reside in **memory** -- - Each program has its own virtual **address space** - Sparsely populated with various areas: static and dynamic (stack, heap) memory, for the program and shared libraries -- - The CPU runs **machine code** - Also present in memory (code segment) - The CPU handles function calls: - With a **calling convention** defining how to place arguments/return values in registers - With **call and return instructions** using the stack to hold local variables, parameters, and return addresses