Virtualization 101 - Introduction to Virtualization

class: center, middle
background-image: url(include/title-background.svg)
# .right[Virtualisation 101]
## .right[Introduction  
to Virtualisation]

.right[Pierre Olivier]
.right[[pierre.olivier@manchester.ac.uk](mailto:pierre.olivier@manchester.ac.uk)]

---
class: inverse, center, middle

# Virtualisation:<br>Quick Definition

---
name: def
# Definition

Quick and easy definition:
> Virtualisation technologies are the set of software and hardware components
that allow **running multiple operating systems at the same time on the same
physical machine**

---
template: def
<div style="text-align:center"><img src="include/definition1.svg" width=800 /></div>

---
template: def
<div style="text-align:center"><img src="include/definition2.svg" width=800 /></div>

---
template: def
<div style="text-align:center"><img src="include/definition3.svg" width=800 /></div>

---
template: def
<div style="text-align:center"><img src="include/definition4.svg" width=800 /></div>

---
template: def
<div style="text-align:center"><img src="include/definition5.svg" width=800 /></div>

---
template: def
<div style="text-align:center"><img src="include/definition6.svg" width=800 /></div>

---
template: def

.center[**Fundamental challenge:**  
  
An OS expects to run alone with full privileges on a physical machine, i.e. to have total control over that physical machine's hardware!
  
How can 2 OSes cohabit on the same host?
]

---
# A Bit of History

- **1960s: IBM's VM**
  - Project System/360 (S/360)
      - Family of computers of various sizes (and processing power) built using the same architecture
      - Client can buy a small model for testing then a big mainframe later
--
- Clients then wanted to move software running on multiple small models to a
  single large one: **consolidation**
--

- 14 models sold between 1965 and 1978, **model 67 (1966) introduced a
      virtualisable Instruction Set Architecture (ISA)**: 
      - Physical machine can appear as multiple, less
      powerful versions of itself: **virtual machines**

---
# A Bit of History (2)

.leftcol[
- **1974: Popek & Goldberg theorem**
  - Seminal paper: *Formal Requirements for Virtualizable Third Generation
    Architectures*
- **1990s: Disco**
  - Hypervisor from Stanford, authors then founded VMware
- **2000s: Xen, KVM, VirtualBox, Hyper-V, etc.**
]

.rightcol[
<div style="text-align:center"><a href="https://dl.acm.org/doi/10.1145/361011.361073"><img src="include/popek.png" width=350 style="box-shadow: 10px 10px 5px grey; border: .1rem solid black;" /></a></div>
]

---
class: inverse, middle, center

# Virtualisation:<br>Motivation & Use Cases

---
# Use Cases: Consolidation

.left3col[
- **Consolidation**: creating X virtual machines from X physical ones and running them on Y physical hosts (with Y < X)
- Historical motivation for developing virtualisation technologies
- Gives (most of) the **benefits of multi-computer systems without the $/management costs**:
      - Software dependencies
      - Reliability
      - Security
]

.right3col[
<div style="text-align:center"><img src="include/consolidation.svg" width=300 /></div>
]

---

# Use Cases: Software Development

- Flexible **OS diversity**: different OSes on the same machine
    - E.g. VirtualBox with Linux for kernel development on a Windows host
--
- **Rapid and cost-efficient provisioning**
  - Way faster than ordering and deploying physical machines
--

- **VMs are self-contained**
  - Practical way to “pack” an application with all its software dependencies
      - Model and version of the OS, libraries, etc.
      - Useful for development, automated testing, and even deployment

---
# Use Cases: Checkpoint/Restart, Migration

- The state of a running VMs is easily identifiable hence the VM can be:
  - **Checkpointed and restarted**: VM's state dumped on disk, can resume later
      - Useful e.g. for long running jobs
--
  - **Live-migrated**: transparently move a running VM between hosts:
      - To free physical machines for maintenance, power saving, load balancing, or when a fault is expected
--
- Both techniques are straightforward for a VM as opposed to an application (i.e. a process)

.small[
* Seminal paper: *Clark et al., _Live migration of virtual machines_, NSDI'05*
]

---
# Use Cases: Hardware Emulation

For development, backward compatibility
<div style="text-align:center"><img src="include/emulation.png" width=700 /></div>

---
# Use Cases: Cloud Computing

- Virtualisation enables **cloud computing**
  - Lets cloud providers <u>securely</u> share their computing infrastructure between
    clients (tenants)
--
- Cloud principle: **offloading local tasks to remote computing resources**, e.g.:
      - Renting VMs to put a web server (IaaS)
      - Deploy and run a web application using Google app engine (PaaS)
      - Offload mail server online to Gmail/Outlook (SaaS)
--
- Goals: save on management, infrastructure, development, maintenance costs

---
# Use Cases: Security

- **Virtualisation provides very strong isolation between guests**
  - **Sandboxing**
      - Cloud, virus/malware analysis, honeypots, process/task level isolation
        through virtualisation (e.g. QubesOS)
--
  - **VM introspection**
      - Analysis of the guest behaviour from a privileged level higher than the
      OS’s Guest OS cannot be trusted
          - E.g. LibVMI

---
class: inverse, center, middle

# Virtualisation:<br>In-depth Definition

---
name: definition
# Virtualisation: Definition

Adapted from the textbook:
> Virtualisation is the **abstraction at a widely-used interface** of one or
  several components of a computer system, whereby the created virtual resource
  is **identical** to the virtualised component and **cannot be bypassed** by its
  clients

- **Applies to a VM:**
  - Abstraction at the software (OS) ⇔ hardware interface
  - Presenting an identical virtual hardware able to run *unmodified* existing OSes
  - Guest OSes cannot escape this abstraction

--
- **But it also applies to more than VMs!**

---
## Virtualisation Examples Beyond VMs

- **Virtual Memory**: MMU abstracts physical RAM with segmentation and paging
  - CPU still access memory with load/stores instructions, cannot bypass virtual memory
--

- **Scheduling**: OS virtualises the CPU transparently to processes/threads
  multiplexed on cores
--

- In the domain of storage:
  - **Redundant Array of Independent Disks (RAID)** abstracts a set of disks
    into a single logical volume
  - The **Flash Translation Layer** abstracts a flash memory device to make it
    look like a hard disk

---
name: more-def
## Multiplexing, Aggregation, Emulation

- Virtualisation, in its general definition, is achieved by using/combining three main principles:

---
template:  more-def
.center[
<div style="text-align:center"><img src="include/more-def1.svg" width=800 /></div>
]

---
template:  more-def
.center[
<div style="text-align:center"><img src="include/more-def2.svg" width=800 /></div>
]

---
template:  more-def
.center[
<div style="text-align:center"><img src="include/more-def3.svg" width=800 /></div>
]

---
name: context
# Course Context

- **In this course we are mostly interested in virtualisation used to
concurrently run multiple OS (potentially different) on a single host**
  - By abstracting the hardware into **Virtual Machines**

---
template:context

---
template:context

---
class: inverse, center, middle

# Virtual Machines

---
# Virtual Machines

---
# Virtual Machines

---

# System-level Virtual Machines

.leftcol[
- **Creates a model of the *hardware* for a (mostly) unmodified operating
system to run on top of it**
  - Each VM running on the computer has its own copy of the virtualised hardware
]

.rightcol[
<div style="text-align:center"><img src="include/system1.svg" width=300 /></div>
]

---
# Virtual Machines

---
# Machine Simulators and Emulators

- Create on a physical host machine a virtual machine of a **different architecture**
--

- **Cross-ISA emulation**: for usage as substitute
  - Compatibility with legacy applications, software prototyping
      - E.g. Qemu in its full emulation mode
--
- **Architecture simulators**:
  - Simulation for analysis and study: computer architecture prototyping, performance/power consumption analysis, research, etc.
      - E.g. Gem5
--
- **Each guest instruction is interpreted in software**
  - **Slow: 5x to 1000x slowdown compared to native execution**
  - Generally cannot run in production

---
# Virtual Machines

---
# Hypervisor-based VM

- **A hypervisor or Virtual Machine Monitor (VMM)** creates a VM of the **same architecture** as the host
- Relies on **direct execution** for performance reasons (close to native)
  - VM code executes directly on the physical CPU, at a lower privilege level than the hypervisor
--

- **VMM emulates only *sensitive* instructions**
  - Instructions that would allow the VM to escape the VMM's control if executed natively (e.g. installing a new page table)
  - Upon encountering a sensitive instruction, switch (**trap**) to the hypervisor which emulates it: trap-and-emulate model
  - Then back to VM execution in direct mode
--

- Examples: Linux KVM, Xen, VMware ESXi, MS Hyper-V, Oracle VirtualBox, etc.

---
# Virtual Machines

---

# OS-level Lightweight "VMs"

- **Software-enforced sandboxing**
  - **Isolation of native code running directly on the CPU though hardware/software mechanisms**
  - Through OS mechanisms offering more isolation guarantees than regular process-based isolation
--

- **No attempt is made to virtualise the hardware**
  - Isolation enforced at the OS/framework level (breaks backward compatibility in some cases)
      - Cannot run unmodified OS
--
- E.g. **Containers**, Google Native Client, Denali, library operating systems (Graphene, etc.)
  - We'll talk a bit about containers in the latter part of this course

---
class: inverse, center, middle

# Hypervisors

---
# Hypervisors (or VMMs)

- **Multiplexes physical resources** between VMs
--

- Runs VMs while **minimising virtualisation overheads**
  - Tries to get as close as possible to native performance
--

- Ensures **isolation** between VMs and between VMs and the hypervisor
  - Isolates physical resources: for example memory/address spaces
  - Isolate performance
--

- Seminal paper* states that virtualisation should be applied following 3 principles
  - **Equivalence**
  - **Safety**
  - **Performance**

.small[
*Popek, Gerald J., and Robert P. Goldberg. "Formal requirements for virtualizable third generation architectures." Communications of the ACM 17.7 (1974): 412-421.
]

---
# Virtual Machines

---
# Type I vs. II Hypervisors

.small[
- **Resources allocation & scheduling**
  - Type I: done by the hypervisor
  - Type II: more involvement from the host OS
  ]

---
# Simplified Hypervisor Sketch

---
# Simplified Hypervisor Sketch

---
# Simplified Hypervisor Sketch

---
# Multiplexing CPU and Memory

- **CPU & memory multiplexed for performance reasons**
  - Challenge: how to maintain near-native **performance** with direct execution (vs. emulation) while making sure VMs cannot escape isolation (**safety**) and can run unmodified code (**equivalence**)
--

- **How to multiplex the CPU?**
      - Virtual CPU runs with reduced privileges, cannot execute *sensitive*
        instructions (instructions that would allow escaping isolation)
      - Traps to the hypervisor on such instructions (virtualisation overhead): **trap-and-emulate paradigm**
--
- **How to multiplex the memory?**
  - How to isolate the memory of different VMs when the guest OSes expects to e.g. be able to craft and install arbitrary page tables?

---
# Emulating I/Os

.medium[
- **I/O mostly emulated for compatibility reasons**
  - I/O devices have standard and well-defined interfaces, for example: send a set of
    network packets, read 128K from disk from sector X, etc.
  - The hypervisor emulates simple virtual devices (disk/NIC) that can be
    accessed with commonly implemented drivers (ex IDE/SCSI): **front-end**
  - Hypervisor redirects I/O to actual devices or other abstraction (e.g. disk or file): **back-end**
  ]
--
<div style="text-align:center"><img src="include/io2.svg" width=650 /></div>

---
## Hypervisors: Memory Denomination

---
## Hypervisors: Memory Denomination

- Another level of translation added, taken care of by the hypervisor:
  - (guest) virtual memory → (guest) pseudo-physical memory → (host) physical memory

---
class: inverse, center, middle

# Wrapping Up

---
# Wrapping Up

- Virtualisation is a very high level concept
  - In this course: **running concurrently several OSes** by creating a fake model
    of the hardware
        - **System-level virtual machines** created by a hypervisor
--
- Principles: **equivalence**, **safety**, **performance**
  - Guest OS should run (close to) unmodified
  - With (close to) native performance
  - And should (absolutely) not be able to escape its isolated environment