class: center, middle ### COMP26020 Programming Languages and Paradigms -- Part 1 *** # Case Study: Operating System Kernel --- # System Call and Context Switch - C is the default language to write OS kernels - Let's study the implementation of: - **System calls handling** - **Context switch** (replacement of a task running on the CPU by another one)
??? - Because of its proximity with the hardware and performance, C is heavily used in OS kernels - Let's have a look at how the **system call** and **context switch** operations are implemented by an OS - We already talked about invoking system calls in the previous video regarding the C standard library - Here we'll get an overview of how they are handled on the kernel side - Regarding Context switches, it corresponds to the action of replacing a task running on the CPU by another one - We'll get an overview of how the kernel manage these --- class: center, middle, inverse # System Calls in a Nutshell ??? - Let's see what happens when a system call is made --- # System Call in a Nutshell
??? - Let's assume that a task is executing on the processor - The CPU state of this task consists in the values present in the processor registers. - We have general purpose registers used for computations, but also special-purpose ones such as stack and instruction pointers. - The stack pointer points to the stack, a contiguous memory area used to store various things such as local variables - The instruction pointer points to the instruction currently executed by the CPU, located inside the program code in memory --- # System Call in a Nutshell
??? - When the syscall instruction is issued, the CPU switches to privileged mode and jumps to a predefined handler in the kernel. It also switches to the kernel stack. --- # System Call in a Nutshell
??? - The kernel starts by saving the application state by pushing all registers on the kernel stack --- # System Call in a Nutshell
??? - Then the kernel determines which syscall is invoked and starts processing it. --- # System Call in a Nutshell
??? - When done the kernel restores the task state by popping all the registers from the kernel stack --- # System Call in a Nutshell
??? - And then user space execution resumes at the next instruction following the syscall --- class: center, middle, inverse # Context Switches in a Nutshell ??? - And now let's see how things work for a context switch --- # Context Switch in a Nutshell
??? - Same as the previous example, let's assume a task running on the CPU, we'll call it task A - We also assume a task B that is currently not running - We also have the kernel code and a kernel stack for each task A and B --- # Context Switch in a Nutshell
??? - Context switch should be done by the kernel so there is a trap to the kernel, following either a system call or a hardware interrupt --- # Context Switch in a Nutshell
??? - The context switch operations starts by saving the state of the task getting scheduled out of the CPU, pushing all registers value on its stack. --- # Context Switch in a Nutshell
??? - Next the state of task getting scheduled in is restored from its own stack, where it was previously saved. --- # Context Switch in a Nutshell
??? - Then the execution of B can resume, in kernel mode first --- # Context Switch in a Nutshell
??? - And finally the kernel returns to user space, and B resumes --- class: middle, inverse, center # System Call and Context Switch Implementations in a Real OS ??? - So we have seen in a nutshell how the syscall handling and context switch operation look like - Now let's have a look at some code --- # HermiTux - Very small OS for cloud applications - Originally based on HermitCore (https://github.com/hermitcore/libhermit) - ~10K lines of C code and a bit of assembly, great to study and learn about OSes - https://github.com/ssrg-vt/hermitux
??? - We'll take at look at the code of the HermiTux kernel - It is a minimal OS for cloud applications - It is an adaptation of an existing OS named HermitCore that you can also check out if you want - The good thing about HermiTux is that it is very small and simple so it's great for learning purposes --- # Context Switch & System Call in HermiTux - **Context switch & syscall handling can only be done by the kernel** - The only way to enter the kernel is through an **interrupt**: - Interrupt traps to the kernel which saves the state of the interrupted task for resuming properly later - Example with `sched_yield` syscall (syscall + context switch):
??? - Now **Context switch \& syscall handling are privileged operations that can only be done by the kernel** - An important thing to note is that the only way to enter the kernel is through an **interrupt**: - Either a Hardware interrupt such as a tick from the timer - Or a software interrupt also called Exception, for example division by 0, or a syscall - When there is an interrupt, the CPU stops executing user code and traps to the kernel - It is during kernel entry that the state of the task that was interrupted is saved - In the rest of the video we'll see what happens when a task executes the `sched_yield` system call - This syscall is made when a task wants to voluntarily relinquish the CPU - This scenario combines a syscall and a context switch - Let's look at what happens on the CPU - First we have task A running - Then it invokes the system call, there is a trap to the kernel and the CPU switches from user mode to kernel mode - The sched_yield syscall is processed by the kernel and the scheduler is called to find another task to run - Then there is a context switch, task A is removed from the CPU and the kernel schedules task B - When returning to user space, there is a switch from kernel to user mode and task B resumes execution --- # System Call in HermiTux - Syscall entry point is in [`arch/x86/kernel/entry.asm`](https://github.com/ssrg-vt/hermitux-kernel/blob/master/arch/x86/kernel/entry.asm#L464): ```asm isyscall: cli ; disable interrupts push rax ; start pushing the task state on the stack push rcx ; ... push many other registers mov rdi, rsp ; prepare a datastructure containing register values for use by the kernel sti ; enable interrupts call syscall_handler ; jump to kernel (C) code managing the syscall ```
??? - Now let's have a look in the code - When the CPU is in user mode, the box is green, and it is orange in kernel mode - When a syscall is invoked by user code it traps to a predefined entry point int the kernel - This part is in assembly, in HermiTux it correspond to the isyscall label - As we can see the interrupted task state is saved on its stack, as all the registers are pushed - Then the kernel jumps to syscall_handler, which is a function implemented in C --- # System Call in HermiTux `syscall_handler` is in [`arch/x86/kernel/isrs.c`](https://github.com/ssrg-vt/hermitux-kernel/blob/master/arch/x86/kernel/isrs.c#L257): ```c void syscall_handler(struct state *s) { switch(s->rax) { // ... one "case" for each syscall number case 24: // sched_yield's number is 24 * s->rax = sys_sched_yield(); break; /* ... */ } } ``` ??? - In syscall_handler, the kernel looks at the value of the saved rax register on the stack to determine which syscall is called - There is a big switch - For sched_yield the syscall number is 24 - And then the kernel calls the syscall implementation, sys_sched_yield - Its return value will overwrite the saved value of rax on the stack, that will be restored once we resume the task execution --- # System Call in HermiTux - `sys_sched_yield` (in [`kernel/syscalls/sched_yield`](https://github.com/ssrg-vt/hermitux-kernel/blob/master/kernel/syscalls/sched_yield.c)) simply calls `check_scheduling`, in [`kernel/tasks.c`](https://github.com/ssrg-vt/hermitux-kernel/blob/master/kernel/tasks.c): ```c void check_scheduling(void) { /* ... */ uint32_t prio = get_highest_priority(); task_t* curr_task = per_core(current_task); if (prio > curr_task->prio) { * reschedule(); /* ... */ } ``` - Finally, `reschedule` calls `switch_context` (in [`kernel/tasks.c`](https://github.com/ssrg-vt/hermitux-kernel/blob/master/kernel/tasks.c#L814)) ??? - sys_sched_yield just calls check_scheduling - check_scheduling look for the task ready to run with the highest priority and, if this priority is higher than the currently running task (i.e. the one that invoked the syscall), a context switch needs to be performed - In that case reschedule is called and this function calls switch_context --- # Context Switch in HermiTux - `switch_context` is in [`arch/x86/kernel/entry.asm`](https://github.com/ssrg-vt/hermitux-kernel/blob/master/arch/x86/kernel/entry.asm#L594): ```asm switch_context: push rax ; ... start pushing the state of the scheduled out task on the stack push rcx push rdx ; ... jmp common_switch ; ... common_switch: ; ... call get_current_stack ; get new rsp mov rsp, rax ; switch stack to the new task's one ; ... pop r15 ; start restoring the scheduled in task state (reverse order) pop r14 ; ... add rsp, 16 ; at that point the stack pointer points to the saved instruction pointer iretq ; restore instruction pointer from the stack, i.e. jumps there ```
??? - At that point we are in the context switch logic - It is performed in assembly - We save the kernel state of the task that is scheduled out by pushing all its registers on the stack - Then we switch to the stack of the scheduled in task - We restore its kernel state that was saved the last time this task was scheduled out by popping all the registers and with iretq we jump to the saved instruction pointer, i.e. we resume the task in kernel mode --- # Context Switch in HermiTux - Executes return path in the kernel until we reach the kernel/user boundary ```asm isyscall: ; ... call syscall_handler ; ... pop r15 ; start restoring the userland state of the task pop r14 pop r13 ; ... sysret ; return from syscall handler to userspace ```
??? - We are now executing in the context of the scheduled in task - We take the return path in the kernel, returning from whatever syscall or interrupt triggered the scheduling out of the task in the past - Let's assume it was a syscall - We go back to the assembly syscall handling code - Restore all registers that correspond to the user state of the task - And with sysret we jump back to user code - And that's it! --- # Summary - To summarise, we have seen how syscall handling and context switch are implemented - It's made with a combination of C and assembly - Assembly is necessary for low level operations such as saving CPU state or switching tasks - It integrates very well with C - In the next video, we'll talk about a different topic: memory safety ---- .center[Feedback form: https://bit.ly/2VD4kfx]