Secure Architectures and Systems - The Theory of Virtualization

class: center, middle

### Secure Computer Architecture and Systems
***
# The Theory of Virtualisation

???

- Hi everyone
- In this video we'll talk about how virtualisation works from the theoretical point of view
- Why study the theory of virtualisation?
- First, it will help us understand the core requirements of virtualisation, and the characteristics that an instruction set architecture needs to have to be virtualisable
- And second, it will allow us to understand the working principles of virtualisation on a hypothetical model of processor which is much simpler than the CPUs we use today

---

# The Popek and Goldberg Theorem

- **Seminal paper published in 1974 in Communications of the ACM**

.leftcol[
.medium[
> Popek, Gerald J., and Robert P. Goldberg. **"Formal requirements for virtualisable third generation architectures."** Comms. of the ACM 17.7 (1974): 412-421.
]]

.rightcol[
  <div style="text-align:center"><a href=https://dl.acm.org/doi/pdf/10.1145/361011.361073><img src="include/popek.png" width=300 style="box-shadow: 8px 8px 5px grey; border: .1rem solid black;" /></a></div>
]

???

- This paper from 1974 by authors Popek and Goldberg is considered a seminal work in the domain of virtualisation
- If you access the slides in your browser and click on the paper you'll get access to its PDF
- It's well worth a read, in particular Sections 1 to 5
- That being said, we'll do a good summary of these sections here

---

# The Popek and Goldberg Theorem

- **Paper published in 1974 in Communications of the ACM**
- Original idea of the paper: **show that some contemporary architectures were not virtualisable**
  - DEC PDP-10 taken as a case study

???
- The original goal of the paper was to prove that back in 1974 certain architecture like the DEC PDP-10 were not virtualisable
- Recall that by virtualisable here we mean that we want to be able to run efficiently several operating systems on the same machine

--
- To that aim:
  - The paper describes the **criteria for a proper virtual machine monitor: *safety*, *efficiency*, *equivalence***

???
- To achieve that goal, the paper starts by describing the key requirement for building a proper virtual machine monitor
- These requirements are safety, efficiency, and equivalence
- We'll develop on these later, but as you can already see security is a central aspect of virtualisation

--
  - It also defines accordingly the **requirements for an ISA to be virtualisable**
      - ISA: Instruction Set Architecture (x86-64, x86-32, aarch64, etc.)
      - Virtualisable: a VMM can be constructed on that ISA in a way that satisfy the aforementioned criteria

???

- The paper also formalises the requirements for an ISA to be virtualisable
- By ISA we mean instruction set architecture, something like x86-64, arm64, etc.
- An by virtualisable we mean can a virtual machine monitor be designed and implemented on that ISA in such as way that it satisfies the key requirements

---
# The Popek and Goldberg Theorem

- Lack of popularity for virtualisation at the time the paper was published
  (70s)

???
- At the time in the 70s the paper did not make a lot of noise because virtualisation was not a popular topic

--
- **Later, VMs become popular (end of 90s)**
  - **x86-32 (most popular ISA at the time) not virtualisable!**
  - In this paper P&G describe an ISA's requirements to support a VMM that
    itself support arbitrary guests, relying exclusively on direct execution
???
- Later, with computers becoming much more common and widespread, there was a big regain of popularity for virtualisation
- By the end of the 90s many actors were looking to run efficient virtual machines, however the most popular ISA at the time, which was Intel x86-32, was not virtualisable
- Indeed it did not satisfy the requirements pointed out in the paper written by Popek and Goldberg
--
- This is now a **seminal paper on virtualisation**
  - AMD & Intel explicitly designed AMD-V and Intel VT-X for x86-64 in the 2000s to meet
  the criteria defined in the paper
???
- Following that, in the 2000s, when the 64 bits instruction sets we still use today were created, their designers took great care to follow the principles defined in the paper
- And they succeeded, as on these ISAs we can run virtual machines very efficiently

- We’ll also learn through this theorem the
  **fundamental principles behind hypervisor operation on virtualisable ISAs**

???

- By studying this paper we will also learn about the working principles if virtualisation, and how a virtual machine monitor works when it runs on a virtualisable ISA

---
# The Popek and Goldberg Theorem

- **We explain the theorem as follows:**
  - Explain P&G **simplified CPU model**
      - Simple hardware platform, but still representative of modern CPUs, as
      a support for the theorem
???
- We will explain the theorem as follows
- We'll first describe the simplified model of processor presented in the paper
--
  - Explain **how a regular OS would run on that simplified CPU
    model** (without virtualisation)
???
- We will then present how a regular operating system would run without virtualisation on top of that simplified processor
--
  - Give the **theorem**: what characteristics an ISA needs to exhibit in order to
    be able to run a VMM and VMs
???
- Then we will present the theorem itself
- Recall that it lists the characteristics that an ISA needs to have to be virtualisable
--
  - **Describe a VMM for that simplified CPU model**
      - Which properties it should satisfy to do its job properly
      - How it operates
???
- We will then describe how our simplified CPU model can be virtualisable
- How a virtual machine monitor would work on that CPU

--
  - Give some examples of **theorem violations** (ISAs not virtualisable)

???
- And we will briefly give examples of ISAs that do not satisfy the theorem, and we'll see how they cannot be virtualisable concretely

---
class: inverse, center, middle

# Simplified CPU Model

???
- Let's first present our simplified CPU model

---
name: segmentation

# Simplified CPU Model (1)

- One processor core with **2 execution modes (privilege levels): user and supervisor**
- **Physical memory** is contiguous, starts at `0`, size: `SZ`
- **Virtual memory implemented through segmentation** (no paging):
  - Single segment: Base `B`, Limit `L`
  - Virtual range `[0, L[` mapped to physical range `[B, B + L[`

???

- This processor has 2 privilege levels, user and supervisor
- We have a certain amount `SZ` of physical memory, starting physical address 0
- Virtual memory is implemented through segmentation (remember this paper is very old)
- At any time the virtual address space seen by what is running on the CPU ranges from virtual address 0 to virtual address L
- It is mapped to a segment of physical memory from physical address B to physical address B + L is mapped 
- Here B means Base and L means limit

---
template: segmentation
<div style="text-align:center"><img src="include/segmentation1.svg" width=800 /></div>

???

- Let's illustrate that real quick

---
template: segmentation
<div style="text-align:center"><img src="include/segmentation2.svg" width=800 /></div>

???

- We have the virtual address space seen by an application running on the CPU, from virtual address 0 to virtual address L
- Mapped to a segment of physical memory, from physical address B to B + L

---
template: segmentation
<div style="text-align:center"><img src="include/segmentation3.svg" width=800 /></div>

???

- The application accesses the virtual address space with loads and stores and these operations are mapped to physical memory by the MMU

---
# Simplified CPU Model (2)

- CPU state: **Processor Status Word** (PSW): (`M`, `B`, `L`, `PC`)
  - Execution level `M` = {`s`, `u`} (supervisor or user)
  - Segment register (`B`, `L`) (physical address of base, and length)
  - The current program counter: `PC`, virtual address
      - Virtual address of the instruction currently executed

???
- That's it for the memory
- Regarding the CPU, its state is composed of 4 control registers that together form the Processor Status Word, PSW
- The control registers are as follows
- First the current privilege level M (for mode), it's equal to S for supervisor mode, or U for user mode
- Second the segment register, made of B and L: as mentioned previously these are the physical address of the base of the currently mapped segment, and its length
- Finally, we have the program counter that points in virtual memory to the current instruction being executed

---
name: psw
# Simplified CPU Model (3)

.leftcol[
.medium[
- Support for **saving PSW content in memory** `MEM[0]` and loading a
   new value from `MEM[1]`
  - Action of entering the OS following an interrupt (also called **trap**)
- Support for **loading PSW content from a location in memory**
  - Exiting the OS after a trap processing
- Traps: hardware interrupts, exceptions (inc. syscalls)
]]

???
- We also need support for entering the OS following an interrupt or exception
- This is also called a trap
- It works as follows

---
template: psw
.rightcol[
<div style="text-align:center"><img src="include/trap1.svg" width=450 /></div>
]

???
- Assume the CPU runs an application in user mode

---
template: psw
.rightcol[
<div style="text-align:center"><img src="include/trap2.svg" width=450 /></div>
]

???
- There is a trap, for example because the application is making a system call

---
template: psw
.rightcol[
<div style="text-align:center"><img src="include/trap3.svg" width=450 /></div>
]

???
- The PSW at that stage represent the CPU state for the application
- It needs to be saved somewhere in memory
- There is a dedicated space for that, it's the first slot in memory MEM[0]
- We also need to switch to the kernel, so we load the kernel CPU state that was previously saved in another dedicated location, MEM[1]
- This kernel state loads the kernel memory segment which simply gives the OS access to the entire physical memory
- And set the program counter to a predefined kernel code location which is the trap entry point
- Of course the privilege level of the kernel PSW is set to supervisor mode

---
template: psw
.rightcol[
<div style="text-align:center"><img src="include/trap4.svg" width=450 /></div>
]

???
- When this is done the kernel starts to run and processes the trap

---
template: psw
.rightcol[
<div style="text-align:center"><img src="include/trap5.svg" width=450 /></div>
]

???
- When the kernel is done, to return to user space the PSW is loaded with the application CPU state from MEM[0]
- And the application resumes

---
# Simplified CPU Model

- **OS operation without a hypervisor:**
  1. Kernel runs in `M` = `s`, apps in `M` = `u`
???
- To summarise how our CPU works with an OS without virtualisation
- The kernel runs in supervisor mode, application run in user mode

2. Kernel sets trap entry point during initialisation
      - `MEM[1] ← (M:s, B:0, L:SZ, PC:trap_entry_point)`
???
- When it boots the kernel loads MEM[1] with the kernel state the CPU should take upon a trap
- That is, supervisor privilege level, access to all physical memory with B equals 0 and L equals SZ, and the program counter set to the trap entry point in the kernel code

--
  3. Kernel allocates a contiguous range of physical memory (`B`, `L`) for each application
???
- For each application the kernel allocates a contiguous range of memory defined with B and L
- The kernel makes sure the segments given to applications do not overlap

4. Kernel launches/resumes apps with address space `[B, B+L[`, currently executing `PC`:
      - `PSW ← (M:u, B:B, L:L, PC:PC)`
???
- The kernel launches and resumes an application which address space is defined by segment [B, B+l] and wants to run instruction pointed by PC by loading its PSW as follows
- User privileged level, B, L, and PC

--
  5. At the trap entry point, kernel decodes the instruction `MEM[0].PC`, determines the cause of the trap and takes appropriate actions
???
- When the kernel runs after a trap, it decodes the application instruction that caused the trap and take action

---
## Hypervisor Objectives and Requirements

To build a hypervisor for that CPU, this is the research question asked by
  Popek & Goldberg:

???
- Now if we want to build a virtual machine monitor for that CPU, the authors of the paper ask the following research question
--

- **Given a computer defined according to the model, under which conditions can
a hypervisor be constructed so that it:**
  - can execute one or more VMs;

???
- What are the requirements so we can build an hypervisor for that CPU in such a way that the hypervisor
- Can execute one or more virtual machines

--
  - supports arbitrary, unmodified, and *potentially malicious* guest OSes designed for the same architecture; and

???
- Can support any operating system designed to run non-virtualised for that CPU
- Including operating systems that are explicitly malicious and will try to bypass the virtualisation layer

--
  - is in complete control of the hardware at all times;

???
- For that reason the hypervisor need to be in complete control of the hardware at all times

--
  - is efficient and show at worst a small performance decrease vs. non-virtualised execution?

???
- And the hypervisor should also be efficient
- Virtualised guests should show at worst a very small performance decrease compared to when they run natively

---
## Hypervisor Objectives and Requirements

**To reach these objectives, the hypervisor needs to comply with these requirements**:

???
- To reach these objectives, the hypervisor needs to comply with the following requirements

--
  1. **Safety**
      - VMM in complete control of the hardware at all time
      - No assumption on guests, they can be malicious
      - VMM must enforce isolation between a VM and the VMM/hardware, and between VMs themselves (no shared state)
???
- Requirement number one: safety
- The hypervisor needs to be in control of the hardware at all time, including in the presence of malicious applications and guest operating systems
- To that aim the hypervisor must enforce isolation between a VM and the hardware, between a VM and itself, and between VMs themselves

--
  2. **Equivalence**
      - VM is a duplicate of the underlying physical machine
      - Guest OS and apps behave the same natively and in a VM
          - They run *unmodified* in the VM
???
- Requirement number two: equivalence
- A VM should be a duplicate of the physical hardware, and the guest OS and applications should not have to be modified to run in a VM
- Their behaviour in a VM should be exactly the same as if they were running natively

--
  3. **Performance**
      - Minimal decrease in a virtualised program execution speed

???
- Requirement number three: performance
- The guest OS and application should see a minimal performance slowdown compared to native execution when they run virtualised

---
class: inverse, center, middle

# The Popek and Goldberg Theorem

???
- Now let's see the theorem that defines the requirements for an ISA to be virtualisable

---
name:theorem

# The Theorem

- We want **any guest CPU instruction that may allow escaping isolation to be caught
  by the hypervisor**
???
- The intuition here is that we want to make sure that any guest instruction that may allow it to escape the virtualisation isolation to trap to the hypervisor
- This way the hypervisor can emulate that instruction safely without breaking the isolation

- Central idea behind constructing an efficient and secure hypervisor:
  - **run hypervisor in supervisor mode** and
  - **run guest apps *and guest OS* in user mode**
???
- So the key idea behind our hypervisor design is going to be
- To run the hypervisor in supervisor mode on the CPU
- And to run the guest applications as well as the guest OS in user mode

- Not doable with every ISA
  - Theorem tells us how an ISA should behave for such an hypervisor to work

???
- This is not doable with every architecture, the theorem will tell us the properties an ISA should have so we can build such a virtual machine monitor on that ISA

---
# The Theorem

- We can classify the instructions of an ISA into 2 categories:
  1. **Sensitive instructions**, subdivided into:
      - **Control sensitive:** instruction that *updates the system state*
          - E.g. instructions modifying PSW in our example, `LGDT` on x86-32: load interrupt descriptor table
???
- The last thing we need before presenting the theorem is to classify instructions
- The first category is called sensitive instructions
- It is subdivided into 2 subcategories
- First, control-sensitive instructions
- These are the instructions that update the system state, for example the instructions modifying the PSW in our example, or LGDT on Intel x86-32 which allows to install new interrupt handlers
--
      - **Behaviour sensitive:** instruction whose *semantics depends on the value of the system state*
        (e.g. the privilege level)
          - E.g. `POPF` loads status register with data from the stack, works in
            supervisor mode but fails silently in user mode
???
- The second subcategory of sensitive instruction are behaviour sensitive ones
- These are the instructions that have a different behaviour depending on the value of system state
- For example, with x86-32, the `POPF` instruction loads a control register with some data when executed in supervisor mode, but it does nothing silently when executed in user mode
--
  2. **Innocuous instructions:** instruction that are not sensitive
???
- Then we have our second categories, which is the rest of the instructions that are not sensitive
- They are called innocuous instructions
--

- We also define the concept of **privileged instructions**
  - Can only be executed in supervisor mode and traps when executed in user mode
      - E.g. `HLT` traps if executed in user mode on x86-32

???
- Finally we also introduce privilege instructions
- These are the instructions that can be executed in supervisor mode, and that rather will cause a trap when executed in user mode
- For example with x86-32 HLT shuts down the CPU in supervisor mode and traps in user mode
- Instructions can be privileged or not independently of their sensitive/innocuous nature

---
# The Theorem

- **Sensitive instructions**: update the system state or semantics depends on
  the value of the system state
- **Privileged instructions:** work only in supervisor mode, trap in user mode

Theorem:
> **For a given ISA, a VMM may be constructed if the set of sensitive instructions
  for that ISA is a subset of the set of privileged instructions**, i.e. if
<br>{control-sensitive} ∪ {behaviour-sensitive} ⊆ {privileged}

???

- OK now we can present the theorem
- I just put a recap on what sensitive and privileged instructions are on the top of this slide
- The theorem states that for a given ISA to be virtualisable, the set of all sensitive instructions need to be a subset of the privileged instructions

---
# The Theorem

???
- In other words all instructions that modify the system state, or which behaviour depends on the system state, need to trap when executed in user mode

---
# The Theorem

???

- On the left you have a virtualisable ISA according to the theorem, with the entirety of sensitive instructions being privileged
- And on the right the ISA is not virtualisable, because some sensitive instructions are not privileged

---
# The Theorem
**{control-sensitive} ∪ {behaviour-sensitive} ⊆ {privileged}**
- **Converse holds too: if the criteria is not met, a VMM cannot be constructed
  for that architecture**

???
- So why can't an ISA be virtualised if some of its sensitive instructions do not trap when executed in user mode
- Recall that with the hypervisor we want to build, both the guest applications and guest OS run in user mode

--
- If a **control-sensitive** instruction does not trap, any guest can modify
      the system state without supervision/check from the VMM
      - E.g. a guest OS installing an arbitrary page table: **loss of safety**
???
- If you consider a control sensitive instruction that would not trap in user mode
- Any guest could update the state of the system without supervision from the hypervisor
- Imagine a guest being able to install an arbitrary segment register or an arbitrary page table and mapping physical memory it is not supposed to access
- That would break the safety criteria the hypervisor needs to maintain

--
- If a **behaviour-sensitive** instruction does not trap, the behaviour of such
  instructions will differ from non-virtualised execution when executed by the
  guest OS
  - Remember that the guest OS runs in user mode when virtualised
      - Guest OS instruction executed with user-level semantics: **loss of equivalence**

???
- If you consider a behaviour sensitive instruction that does not trap in user mode, it means that the guest OS, when executing this instruction and expecting the supervisor mode behaviour, will actually see the user mode behaviour
- This breaks the equivalence criteria the hypervisor needs to maintain

---
# Hypervisor Operation

Under these conditions, VMM operates as follows:
- For performance reasons we want to run as much guest code as possible
  directly on the CPU i.e. without involving (i.e. trapping to) the VMM
- **VMM runs in supervisor mode, guest OS runs in user mode**

???
- For performance reasons we want to run as much guest code as possible directly on the CPU without trapping
- So as we mentioned the hypervisor will run in supervisor mode, and the guest including its operating system will run in user mode, as illustrated on the slide

---
# Hypervisor Operation

- **VMM allocates contiguous physical memory for himself, never accessible by
  guests**
- **VMM allocates contiguous physical memory for VMs**
  - Each machine gets a range defined by `addr0` and `memsize`

???
- On our simplified CPU model, virtualisation will work as follows
- The hypervisor will reserve some contiguous memory for itself
- That memory should never be accessed by the guest for obvious security reasons
- The hypervisor will also allocate contiguous ranges of physical memory, one for each VM
- We can define each VM range with a base address add0, and a length, memsize

---
# Hypervisor Operation

- VMM keeps in memory the CPU state for each VM, `vPSW`
  - Consists of `(M, B, L, PC)`
      - `M`: execution mode the VM **thinks** it’s running on: `s` when guest OS runs and `u` when a guest app runs

???
- For each VM the hypervisors keeps in memory a data structure that represents a software model of what the VM thinks is the current PSW
- We call it the virtual PSW, vPSW
- It has the same registers as the real PSW
- A privilege level, which is user when the VM runs an app and supervisor when the guest OS runs
- A segment register representing the address space of what is currently running in the VM
- And a program counter
- These are all illustrated on the slide

---
# Hypervisor Operation

.small[
- **VMM starts/resumes VM execution by loading the hardware `PSW ← (M’, B’, L’, PC’)`**
  - `M’ ← u`
  - `B’ ← addr0 + vPSW.B`
  - `L’ ← vPSW.L`
  - `PC’ ← vPSW.PC`
]

???
- When starting a VM or when resuming the execution of a VM following a trap, the hypervisor loads the real PSW as follows
- As explained previously the real privilege level of a VM is always user mode
- The real segment base address is the base address of the physical memory allocated to the VM, addr0, plus the base address of the vPSW, vPSW.B
- The real segment length is the vPSW length
- The real program counter is that indicated in the PSW, with a similar check
- Addresses from the vPSW are checked before being loaded so that we don't load in the real PSW something that would go beyond the limits of what is allocated to the VM

---
# Hypervisor Operation

- Note that any attempt by the guest to modify `M`, `B` or `L` will trap
    - Theorem hypothesis assumes all control-sensitive instruction are also privileged

**More generally, what happens upon a trap?**

???
- Remember that because all sensitive instructions will trap in user mode, because the VM runs entirely in that mode, any attempt to modify the PSW will trap
- And these actions can be emulated by the hypervisor
- More generally, what happens when there is a trap

---
# Hypervisor Operation

What happens upon a trap to the hypervisor?

1. Hypervisor update `vPSW.PC` ← `PSW.PC`
2. Hypervisor looks at the instruction that trapped (`PSW.PC`) and emulates its
   semantics according to the ISA
  - Actions taken here depend if guest trapped from user or kernel space

???
- When there is a trap the hypervisor keeps the vPSW up to date by writing the trap's PC in it
- Based on the instruction that caused the trap, the VMM will emulate a non-virtualised machine's behaviour
- What to do depends what the guest was running when it trapped, was it application code, or kernel code

---
# Hypervisor Operation

- **If guest OS caused the trap** (`vPSW.M == s`): sensitive instruction
  trapped, VMM handles it

???
- If the guest kernel caused the trap, it means the guest OS probably executed a sensitive instruction
- The VMM handles that depending on the instruction in question

--
  - E.g. if guest OS tries to update the segment register, the VMM checks and update `vPSW.B` and `vPSW.L`
          - Hardware `PSW.L` and `PSW.B` will be set accordingly when we return
              to VM execution: `PSW.B` ← `addr0 + vPSW.B` and   
              `PSW.L` ← `vPSW.L`
          - The MMU is **transparently configured differently than what the guest OS requests**
???
- For example if the guest is trying to update the segment register
- The vPSW is updated with what the guest wants to write in there
- And when we return to the guest from the trap, the real PSW will be updated with the method we previously saw
- This way the MMU is configured differently from what the guest requests, but this is also completely transparent from the guest point of view
--
  - VMM ensures the VM will resume at the next instruction: `vPSW.PC++`
  - VMM loads PSW and VM resumes

???
- Before returning to the guest the VMM will set the vPSW's PC to the next instruction to denote the fact that the emulated instruction ran successfully
---
# Hypervisor Operation

- **If guest application caused the trap** (`vPSW.B == u`): app is doing a syscall/something illegal: trap should be handled by the guest OS

???
- Now if we have a trap while the guest was running application code, the application is either doing a syscall or a fault like a division by zero
- To manage that fault the guest OS needs to run

--
  1. `MEM[addr0]` ← `vPSW`, save guest application state in the host-physical
   location of guest-physical `MEM[0]`
  2. `vPSW` ← `MEM[addr0 + 1]` load guest OS state (OS entry point) from memory,
    after a check on the validity of `B` and `L`
  3. VMM loads PSW to resumes VM (in guest OS mode based on the updated vPSW)

???
- So the hypervisor starts by saving the guest application's state which is the vPSW inside the VM's memory in the dedicated location which is the VM's equivalent of MEM[0]
- Then it loads into the vPSW the guest OS state from the guest's memory equivalent of MEM[1]
- Finally it loads the real PSW based on the method we presented previously to resume the VM and start to execute its kernel

---
# Hypervisor Operation

- **According to the theorem hypothesis, all instructions updating the system
state (control-sensitive) are privileged, so they will trap**
  - Includes instructions updating the virtual to physical mapping
      - Each of these needs to be **checked** by the VMM to ensure **safety**
        (isolation)
      - Each of these needs to be **emulated** to give each VM the illusion of
      exclusive and full access to the hardware (**equivalence**)

???

- Going back to the theorem, because all guest control sensitive instructions that update the state of the system trap to the hypervisor
- The hypervisor can check them, for example to make sure a VM does not map memory outside of what it can access
- The hypervisor can also emulate them, to give each VM the illusion that it's in total control of the hardware like it would when running natively

---
# Hypervisor Operation

- **User/supervisor transition instructions (e.g. syscalls) need to be tracked by the VMM**
  - To keep track of `vPSW.M` to correctly emulate privileged instruction based on their source (guest application vs. guest OS)
  - Such transition instructions are sensitive, so tracking is possible
???
- The transition instructions between the guest application and kernel need to trap to the hypervisor, so that it can update the mode register of the vPSW
- This way the hypervisor knows when a trap originates from the guest kernel or application, and it can emulate it accordingly
- Transition instructions are sensitive, so they will trap
--

- **Still according to the hypothesis, behaviour-sensitive instructions will also trap**
  - E.g. reading `PSW.M` or `PSW.B`
      - Remember than the actual values are set by the VMM to something
        different from what the guest OS think they are
      - Need to be emulated by the VMM otherwise this would lead to programs
        behaving differently on bare-metal vs virtualised: **equivalence** 
        requirements

???
- Behaviour sensitive instructions, for example reading the values of the PSW, will also trap
- Once gain the real PSW is loaded with values that are different from the guest OS thinks is in the PSW
- So if the guest OS it tries to read these registers, the hypervisor will return the emulated values, so we can maintain equivalence

---

# Theorem Violations and Workarounds

- Many ISAs proposed between the 70s-2000s violated the theorem and were not virtualisable properly
  - E.g. with x86-32,  `POPF` instruction is behaviour sensitive but does not trap (fails silently) in user mode

???
- Many ISAs violated the theorem
- In particular, x86-32, which was the most popular ISA in the 90s when the use of computers exploded, was not virtualisable
- Among other things, the POPF instruction, which was behaviour sensitive, worked well in supervisor mode but did not trap and rather failed silently in user mode
- This was a problem because the demand for virtualisation became very high at that time

--
- Techniques to virtualise ISAs that do not adhere to the P&G criteria:
  - **More emulation**: run the entire guest OS code or every page table access as in emulation
      - **Very slow, breaks performance**
  - **Modify the guest OS** to handle the ISA limitations, called paravirtualisation (Xen)
      - **Breaks equivalence**

???
- So techniques were developed to try to virtualise ISAs violating the theorem
- Each of them had to compromise on some of the key objectives of virtualisation
- For example, introducing more emulation by running the entire guest OS or at least every access to the guest page table as emulated would allow to virtualise x86-32, but it was very slow, killing performance
- Another approach was paravirtualisation, where the guest OS was modified to be virtualisation-aware, which killed equivalence
- As you can see we can compromise on performance or equivalence, but of course never of safety

---
# Wrapping Up

- P&G: can build an hypervisor offering performance/safety/equivalence by running
       it in user mode
- Theorem: **all sensitive instructions need to trap in user
mode for an ISA to be virtualisable**
- Demonstrated how to operate an hypervisor under a simplified CPU model

???
- That's it
- Hopefully you got the key ideas here, as well as the little technical details
- If you struggle with the technical details, I suggest that you give a read to the paper, it's pretty short
- Seeing things a second time after watching the video should make them clearer
- To sum up, we saw the key objective of an hypervisor, which are performance, safety, and equivalence
- We saw the Popek and Goldberg theorem, which states that for an ISA to be virtualisable, all sensitive instruction must trap when executed in user mode
- We also saw how an hypervisor satisfying the objectives can be built on a simplified CPU model with an ISA satisfying the theorem's requirements