Exercise Objectives and Logistics

Overview

The objective of this exercise is to develop a simple emulated device in the Qemu virtual machine monitor, to develop the guest driver controlling that device in the Linux kernel, and to write a small guest user space application making use of the device through the driver. The emulated device is a simple random number generator (RNG).

The different software components you will have to develop are illustrated in green on the figure below:

The components to develop are:

The emulated device running on the host within Qemu.
A driver for the device running within the guest operating system, Linux.
A user space application leveraging the driver to make use of the device.

Random Number Generator Virtual Device

Our virtual RNG device offers two functionalities:

Generating random numbers: applications can query the virtual device through a driver in the guest kernel to obtain random numbers.
Seeding the RNG: applications can initialise the RNG with a particular seed.

The random number generator will be connected to the VM's virtual CPU on the PCI bus, and communication between the device and the CPU will be achieved with memory mapped I/O registers. To function the devices exposes an interface made of two registers, illustrated below:

You can find more information about the registers in the table below.

Register name	Offset from base address	Size	Mode (R/W)	Description
`RNG`	`0x0`	4 bytes	R	Reading this register returns a random number in the form of a 32 bits unsigned integer. Each new read returns a new random number.
`SEED`	`0x4`	4 bytes	W	Writing an unsigned 32 bits number to this register seeds the random number generator with that value.

Exercise Structure

This exercise will be in 2 parts: the first part represents most of this guide, which is a tutorial that will hold your hand to develop a basic version of the software presented on the figure above. Accomplishing this first part will get you a part of the marks for this exercise (12/20). The second part is open-ended: you will be asked to enhance the basic prototype developed (some suggestions will be given, but you can also decide to implement your own enhancements). That part will count for the rest of the marks (8/20). Along with your code you should submit a short, 2-page report, describing the enhancement(s) you develop, and how they can be tested.

Deadline and Submission Format

The deadline for submitting this exercise is ~~21/01/2025~~ 24/01/2025. In case of late submission, a penalty of 10% for each day late will be applied to the final mark.

To submit you should send the following things items ~~by email~~ to the instructor (pierre.olivier <at> manchester.ac.uk):

A patch, corresponding to your implementation of the virtual device, that can be applied to Qemu's vanilla source code v8.2.0.
Another patch, corresponding to your implementation of the virtual device's driver, that can be applied to Linux's vanilla source code v6.6.
The 2-page report describing your enhancements and how to test that they are functional.

Guidance on how to generate the patches will be given at the end of this tutorial.

Please upload your submission's files to an online service such as a private GitHub repository or a Google Drive folder and give the instructor access to it (GitHub ID: olivierpierre), and notify him by email. The instructor will acknowledge reception of each submission.

Marking Scheme

The exercise is marked out of a total of 20.

Part 1: guided tutorial /12

Device: the Qemu patch applies without errors/warnings to the vanilla sources of Qemu v8.2.0-rc2 /1.
Device: a small test from the Linux kernel boot process repeatedly seeding the device with the same value yields similar random number sequences /3.
Device: the implementation of the read/write mmio function follows the device's specifications (proper register addresses and sizes used) /2.
Driver: the Linux patch applies without errors/warnings to the vanilla sources of Linux v6.6.4 /1.
Driver: a small test after creation of the virtual file from a user space application using the driver to repeatedly seed the RNG with the same value yields similar random number sequences /3.
Driver: data is transferred safely between user and kernel space /2.

Part 2: going further /8

The report is clear about what enhancements were developed and how to test them /3.
The enhancements are functional/degree of ambition of the enhancements /5.

Required Setup

To complete that exercise you will need a recent Debian/Ubuntu Linux installation with root privileges. The recommended distributions are Debian 12 and Ubuntu 22.04. Certain steps of this guide are not guaranteed to work on other distributions/versions.

The machine you use also needs to have an Intel x86-64 CPU (this guide will not work with MacBook's ARM M1/M2 CPUs). If somehow you cannot get a proper setup please contact the instructor, you will be given temporary access to a remote machine satisfying these constraints.

Setting Up a Development Environment

To set up the proper environment for this exercise you can either:

Use a virtual machine. Download the following VM image and import it into VirtualBox. To log in, enter the username user and password a. That user has root access with sudo.
Use a Docker container if you know what you are doing. If it is not installed on your machine install Docker. Start a Debian 12 container from your host, for example if you are using the command line:
```
docker run -it debian:12
```
Make sure to install the few pre-requisite Debian packages that will allow us to compile Linux and Qemu. Within the container:
```
apt update
apt install -y build-essential git bc libssl-dev flex bison wget python3 python3-venv ninja-build pkg-config libglib2.0-dev libelf-dev libslirp-dev
```
Install Linux natively if you know what you are doing. If you are not familiar with that or don't already have a Debian 12/Ubuntu 22.04 distribution up and running, this is not recommended. Indeed, it can take a bit of time and learning how to install Linux bare metal is not the goal of this exercise.

Note that whatever environment you use make sure there is at least 20 GB of free disk space.

In the exercise, the VirtualBox VM/Linux container/native Linux install you will work on will be called the host. This is because in that host we will create another virtual machine that will make use of the virtual device you developed. That VM will be called the VM. It is a bit counterintuitive to name the VirtualBox VM "host" and not VM, but this is done for reasons of consistency between the different development environments you may use.

Creating a Base Directory for the Exercise

You should create a base directory for the exercise. In the rest of this guide, that base directory is assumed to be present in your home folder and named virt-101-exercise. You can create it with the following command:

mkdir ~/virt-101-exercise

Building Qemu

To develop the virtual device we will need to modify the Qemu virtual machine monitor, so a first step is to download its sources and make sure we can compile it.

Cloning Qemu Sources Repository

Qemu sources can be cloned from the project's git repository on GitLab. Place yourself in the exercise base directory, and run the following command to do so:

cd ~/virt-101-exercise
git clone --branch v8.2.0 --depth=1 https://gitlab.com/qemu-project/qemu.git qemu-8.2.0

Here to save storage space and network bandwidth we will only clone the particular version we are interested in (--branch v8.2.0) without any history (--depth=1). Qemu's sources are now in the folder qemu-8.2.0.

Compiling Qemu

Place yourself into Qemu's source folder and prepare the build by calling the configure script:

cd qemu-8.2.0
./configure --prefix=$PWD/prefix --target-list=x86_64-softmmu

Launch the build and trigger the installation once done:

make -j4 install

This can take a bit of time depending on the processing power of your host, among other factors.

Trying Out Qemu

Once the build and installation are done, you can check that all went well by launching an empty virtual machine:

./prefix/bin/qemu-system-x86_64 -nographic

The -nographic option indicates that the VM will have serial console output only (and no graphical output), which simplifies a lot this exercise. You should see something like that:

# ...
Booting from Hard Disk...
Boot failed: could not read the boot disk

Booting from Floppy...
Boot failed: could not read the boot disk

Booting from DVD/CD...
Boot failed: Could not read from CDROM (code 0003)
# ...

What you see here is Qemu's virtual bootloader attempting to boot on a few virtual devices (hard disk, CD, etc.). Because there is nothing in there it fails to do so, this is normal.

To exit Qemu, press ctrl+a followed by x. Remember this shortcut, you will need to use it extensively in the rest of the exercise.

Qemu when exiting sometimes also interferes with the console which will lead to a messed up display when you type a command longer than a console line. If this happens simply runs this command to reset the console:
reset

Building the Linux Kernel

Now we will build the operating system kernel we will use for the guest: Linux.

Cloning Linux Sources

Similarly to what we did with Qemu, we first clone a particular version of Linux's sources:

cd ~/virt-101-exercise
git clone --depth=1 --branch v6.6 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux-6.6

Compiling a Minimal Linux Kernel for Our Virtual Machine

Next we'll compile a minimal version of Linux that can run in a VM created by Qemu. To that aim we'll use the default configuration for a basic x86-64 machine. That configuration can be generated as follows:

cd linux-6.6
make x86_64_defconfig

Once done launch the compilation of the kernel with the following command:

make -j4

This will take a while, but don't worry you will have to compile the entirety of the kernel only once. The subsequent builds for that exercise (e.g. after you have implemented the driver) will be incremental, i.e. much faster (a few seconds). Once finished, the compiled kernel's binary is arch/x86/boot/bzImage.

Trying Out the Guest Kernel

We can already try to boot the kernel we just built with Qemu as follows:

cd ~/virt-101-exercise
./qemu-8.2.0/prefix/bin/qemu-system-x86_64 -m 1G -kernel linux-6.6/arch/x86_64/boot/bzImage -nographic -append "console=ttyS0"

Here we tell Qemu to create a machine with 1G of RAM and to use the kernel we compiled. The option -append indicates that the option "console=ttyS0" should be passed to the kernel, telling it to output its boot log on the VM's serial console.

You should see Linux starting to boot. The boot process should end with the following error:

Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

Once again you can exit Qemu with ctrl+a then x. This is normal: the kernel is not sufficient to fully run a VM, and we are missing a root filesystem. The root filesystem consists in all the user space basic system utilities such as a shell, etc. We will install a minimal Alpine Linux filesystem in the next step.

Installing an Alpine Root Filesystem

Now we will install a minimal root filesystem in the VM, using the Alpine Linux distribution.

Creating a Virtual Disk

First create an empty virtual hard disk for the VM:

cd ~/virt-101-exercise
./qemu-8.2.0/prefix/bin/qemu-img create -f qcow2 alpine.qcow2 2G

Here we use the qemu-img tool (compiled alongside Qemu) to create in a file named alpine.qcow2 a virtual hard disk of size 2 GB. We indicate also that the format of the disk should be Qcow2.

Downloading and Extracting an Alpine Image

Next we download an installation ISO image of Alpine:

wget https://dl-cdn.alpinelinux.org/alpine/v3.19/releases/x86_64/alpine-standard-3.19.0-x86_64.iso

Installing Alpine on the Virtual Disk

Next we will launch the VM with the ISO image in the virtual CD drive, as well as the virtual hard disk plugged:

./qemu-8.2.0/prefix/bin/qemu-system-x86_64 -m 1G -nic user -boot d -cdrom alpine-standard-3.19.0-x86_64.iso -hda alpine.qcow2 -nographic

The notable parameters here are:

-nic user that creates a virtual network allowing the VM to access the internet (needed for Alpine installation).
-boot d -cdrom alpine-standard-3.19.0-x86_64.iso uses the ISO image in the virtual CD-ROM drive of the VM, and indicates to the Qemu's BIOS to boot on that CD-ROM (d).
-hda alpine.qcow2 plugs the previously created empty virtual hard disk in the VM.

The ISO can take a few seconds to boot silently. When the login prompt appears:

Welcome to Alpine Linux 3.19
Kernel 6.6.4-1-lts on an x86_64 (/dev/ttyS0)

localhost login:

setup-alpine

Next the setup process will ask a series of questions. Answers the default choice (i.e. just press enter) for all but the following:

Choose a password when asked.
To the question Allow root ssh login? answer yes.
To the question Which disk(s) would you like to use? answer sda.
To the question How would you like to use it? answer sys.
To the question Erase the above disk(s) and continue? answer y

Once the installation is over, shut down the VM with the following command:

halt

Wait a few seconds then exit Qemu with ctrl+a then x.

Alpine is now installed on the virtual hard disk alpine.qcow2.

Creating a VM Boot Script

As you can see the Qemu invocation command can be pretty long, and it is tedious to type or even copy and paste it each time we boot the VM. We can create a shell script to automate the launch of the VM. Edit the file ~/virt-101-exercise/launch-vm.sh and write inside the following:

#!/bin/bash

./qemu-8.2.0/prefix/bin/qemu-system-x86_64 \
    -m 1G \
    -nographic \
    -nic user \
    -hda alpine.qcow2 \
    -kernel linux-6.6/arch/x86_64/boot/bzImage \
    -append "console=ttyS0 root=/dev/sda3"

You will notice that this VM uses the kernel we compiled in the previous step, and boots on the virtual hard disk alpine.qcow2 (/dev/sda3 from the kernel point of view)..

Give the script executable permissions:

chmod +x ~/virt-101-exercise/launch-vm.sh

KVM Acceleration. If your have a native Linux installation on your host, or if you are running a privileged container, you can tell Qemu to activate KVM acceleration. This will transform your emulated VM into a proper direct execution-based VM which concretely will make it much faster. To check if you can enable KVM acceleration, check for the presence of the file /dev/kvm. If it is present, add this option to Qemu's command line: -enable-kvm You may need to invoke Qemu as root for the VM to start if you use that option.

Starting the VM

The VM can now be started with the script:

cd ~/virt-101-exercise
./launch-vm.sh

The kernel will boot and load the root filesystem, then Alpine with initialise. Once you reach the login prompt, login as root with the password you defined during the installation. The VM is now fully installed!

Always try to shut down the VM properly with the halt command, and wait before the kernel prints reboot: System halted before killing Qemu with ctrl+a then x. If you don't there is a non-negligible risk of shutting down the filesystem in an inconsistent state, corrupting it. If that happens you will lose the VM's filesystem content and will be forced to reinstall Alpine.

Implementing the Virtual Device in Qemu

We now have the VM set up and the sources of Qemu and Linux ready to be modified. We'll start by modifying Qemu to implement the virtual random number generator. The goal is to emulate that device, e.g. adhere to the same interface the guest OS would use to communicate with a real hardware component: the random number generator will be connected to the VM's virtual CPU on the PCI bus, and communication between the device and the CPU will be achieved with memory mapped I/O registers. The implementation of the RNG itself (e.g. how random numbers are generated) will be done completely in software, for example by using the rand() and srand() functions provided by the C standard library in Qemu on the host.

You can refresh your mind about the functionalities of the virtual device and its registers here.

At that point the base folder for the exercise should look like that:

virt-101-exercise/    # exercise base directory
|-- alpine.qcow2      # virtual hard disk with Alpine installed
|-- launch-vm.sh      # VM launch script
|-- linux-6.6/      # kernel sources
|-- qemu-8.2.0/   # Qemu sources

Adding a New Source File in Qemu

We'll start by creating a new C file in which we will implement the device:

cd ~/virt-101-exercise/qemu-8.2.0
touch hw/misc/my-rng.c

Next we need to add that file to the build system so that it gets compiled and linked against the rest of Qemu sources. Add the following at the top of the file hw/misc/Kconfig:

config MY_RNG
    bool
    default y

And add that line at the top of the file hw/misc/meson.build:

system_ss.add(when: 'CONFIG_MY_RNG', if_true: files('my-rng.c'))

A modification of the build system requires reconfiguring and recompiling all of Qemu sources. To do so simply type the following command in Qemu's sources root directory:

make -j4 install

To make sure your file is included in the build you can force its recompilation as follows:

touch hw/misc/my-rng.c
make

You should see in the output:

[3/4] Compiling C object libcommon.fa.p/hw_misc_my-rng.c.o

Implementing the Device

Now we will implement the virtual random number generator in hw/misc/my-rng.c. We'll first need to include the following headers as they define data structures and functions we need:

#include "qemu/osdep.h"
#include "hw/pci/msi.h"
#include "hw/pci/pci.h"

Next we define the device's name with a macro, and create a data structure representing the device:

#define TYPE_MY_RNG "my_rng"
#define MY_RNG(obj) OBJECT_CHECK(my_rng, (obj), TYPE_MY_RNG)

typedef struct {
    PCIDevice parent_obj;
    uint32_t seed_register;
    MemoryRegion mmio;
} my_rng;

The important bits here are the seed_register member, that we will use to hold the seed, and mmio, a data structure that will hold functions to read and write from the device's memory mapped registers.

Next we define the functions that will run when the device's memory mapped registers are read/written:

static uint64_t mmio_read(void *opaque, hwaddr addr, unsigned size) {
    /* TODO implement that function later */
    return 0x0;
}

static void mmio_write(void *opaque, hwaddr addr, uint64_t val, unsigned size) {
    /* TODO implement that function later */
    return;
}

static const MemoryRegionOps my_rng_ops = {
    .read = mmio_read,
    .write = mmio_write,
};

It will be your task to implement these functions later. For now, it is fine to leave them empty. Notice the my_rng_ops data structure that contain members pointing to both functions.

The rest of the source file contains a series of initialisation functions:

static void my_rng_realize(PCIDevice *pdev, Error **errp) {
    my_rng *s = MY_RNG(pdev);
    memory_region_init_io(&s->mmio, OBJECT(s), &my_rng_ops, s,
                          "my_rng", 4096);
    pci_register_bar(&s->parent_obj, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->mmio);
}

static void my_rng_class_init(ObjectClass *class, void *data) {
    DeviceClass *dc = DEVICE_CLASS(class);
    PCIDeviceClass *k = PCI_DEVICE_CLASS(class);

    k->realize = my_rng_realize;
    k->vendor_id = PCI_VENDOR_ID_QEMU;
    k->device_id = 0xcafe;
    k->revision = 0x10;
    k->class_id = PCI_CLASS_OTHERS;
    
    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
}

static void my_rng_register_types(void) {
    static InterfaceInfo interfaces[] = {
        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
        { },
    };

    static const TypeInfo my_rng_info = {
        .name = TYPE_MY_RNG,
        .parent = TYPE_PCI_DEVICE,
        .instance_size = sizeof(my_rng),
        .class_init    = my_rng_class_init,
        .interfaces = interfaces,
    };

    type_register_static(&my_rng_info);
}

type_init(my_rng_register_types)

You don't need to fully understand this code. Notable things here are:

The my_rng_realize function that initialises an instance of the virtual random number generator by:
- Creating a region of I/O memory for the memory mapped registers with memory_region_init. That region has a size of 4 KB which is much larger than what we need (we have 2 registers of 4 bytes each) but corresponds to the size of a memory page.
- Registering the device on the PCI bus with pci_register_bar.
The my_rng_class_init that will run once when Qemu starts and define a few characteristics common to all instances of our virtual device, such as an easily identifiable device ID (0xcafe). A member realize of the corresponding PCIDeviceClass data structure also points to the per-instance initialisation function my_rng_realize.

At that point you can try to recompile Qemu by typing, at the root of its source folder:

make install

You should fix any error or warning at that stage. Once everything compiles fine we can check if the device appears in the VM.

Checking the Presence of the Device in the VM

To enable the device in the VM, edit the launch script ~/virt-101-exercise/launch-vm.sh and add the following command line option to Qemu's invocation:

-device my_rng

You can check the presence of the virtual device by enumerating PCI devices in the VM. Boot the VM and install lspci using Alpine's packet manager APK:

apk add pciutils

Next, still in the VM, enumerate PCI devices:

lspci -v

You should see the following device:

00:04.0 Unclassified device [00ff]: Device 1234:cafe (rev 10)
	Subsystem: Red Hat, Inc. Device 1100
	Flags: fast devsel
	Memory at febf1000 (32-bit, non-prefetchable) [size=4K]

You can recognise the device ID 0xcafe we defined earlier. Notice also the address where the device's registers are mapped in (physical) memory. Here it is 0xfebf1000 but it may be different on your computer.

Implementing the Read/Write MMIO Functions

To finalise the implementation of our virtual random number generator, one must implement the two functions we defined earlier:

static uint64_t mmio_read(void *opaque, hwaddr addr, unsigned size) {
    /* TODO */
    return 0x0;
}

static void mmio_write(void *opaque, hwaddr addr, uint64_t val, unsigned size) {
    /* TODO */
    return;
}

You are responsible to implement these functions. A bit of information to help you achieve that:

mmio_read is called when the guest OS tries to read in one of the device's memory mapped registers, and mmio_write is called when a register is written. mmio_read returns the value that the guest OS will read.
The addr parameter will contain the offset from the base address in the area of memory mapped I/O at which the read/write takes place, which should allow you to identify the target register.
The size parameter denotes the size of the read/write operation.
The opaque pointer points to the device's data structure of type my_rng, so you can get a pointer to the device's data structure with a cast: my_rng *dev = (my_rng *)opaque;.
The actual RNG should be implemented in software and the easiest way to achieve that is probably to use the standard C library's functions rand() (to get a random number) and srand() (to seed the random number generator).

Testing the Virtual Device from the Guest Kernel

Before writing the actual driver it is probably a good idea to a quick test of the device from the guest kernel and check it behaves correctly. To that aim we can do a small modification of the Linux guest kernel sources, and insert some calls to the device. For the sake of simplicity we'll insert these calls at the end of the boot process, when the system is well initialised but without involving the user space.

Locating the Kernel Main Function

The kernel is a computer program like any other and as such it has an entry point. This entry point is written in assembly but after a short early initialisation, the CPU will jump to C code. More precisely, the C entry point of the kernel is the function start_kernel, which is implemented in the Linux sources in the file init/main.c.

If you check out its implementation, you'll see that start_kernel initialises many subsystems and then call arch_call_rest_init, which itself calls rest_init. rest_init spawns a kernel thread that runs the kernel_init function. The kernel_init function finalises the initialisation of the system and then starts the first user space application. This is a suitable point in the boot process to insert our test calls to the device, because the system is fully initialised, and we are also still in kernel space.

Inserting Test Calls to the Device

Our test will perform the following things:

Seed the RNG with a fixed seed e.g. 0x42
Generate 5 random numbers and print them on the kernel log

Steps 1 and 2 will be repeated twice, so we can check that the 5 random numbers generated from the same seed are the same for both iterations.

In the kernel_init function, add the following code after the call to do_sysctl_args(); (it's around line 1464):

printk("------------------------------------------------------------------\n");
printk("BEGIN MY-RNG TEST\n");
printk("------------------------------------------------------------------\n");

// Map the area of physical memory corresponding to the device's registers
// (starting 0xfebf1000, size 4KB) somewhere in virtual memory at address
// devmem. Notice that the physical memory where the device's registers are
// present may be different on your computer, use lspci -v in the VM to
// find it
void *devmem = ioremap(0xfebf1000, 4096);
unsigned int data = 0x0;
if(devmem) {
    for(int i=0; i<2; i++) {
        // seed with 0x42 by writing that value in the seed register which
        // is located at base address + 4 bytes
        iowrite32(0x42, devmem+4);

        // obtain and print 5 random numbers by reading the relevant
        // register located at base address + 0
        for(int j=0; j<5; j++) {
            data = ioread32(devmem);
            printk("Round %d number %d: %u", i, j, data);
        }
    }
} else {
    printk("ERROR: cannot map device registers\n");
}

printk("------------------------------------------------------------------\n");
printk("END MY-RNG TEST\n");
printk("------------------------------------------------------------------\n");

A few notable things in this code:

We use printk to print to the kernel log. It's very similar to the printf function you are familiar with in user space. With printk we display when the test starts and ends so that things are clearly visible in the kernel log.
The test code starts by mapping the physical memory where the device's registers are present into virtual memory (shortly after the very early boot process the CPU can only access virtual memory) at an address pointed by devmem. This is achieved with the ioremap function, that takes as parameters the physical address to map into virtual memory, as well as the size of the area to map (here one page, i.e. 4 KB, as defined when we implemented the device). Note the address in physical memory where the device's registers are mapped, here 0xfebf1000. It may be different on your computer. To find it out, you can use lspci within the VM, as previously explained.
Once the device's registers are mapped into virtual memory, we can read and write to them using ioread32 and iowrite32. It's important to use these functions rather than directly read/write to memory because these are not standard memory access operations: these functions will ensure important things like bypassing the CPU caches, disabling compiler optimisations, and will have memory barriers preventing the compiler/CPU to reorder the corresponding instructions. Through these functions we have two types of operations when talking to the device:
- Generating a random number: to achieve that we read with ioread32 the first register which is located directly at the device's base address
- Seeding the RNG: to do so we write with iowrite32 the second register which is located 32 bits (4 bytes) from the base address

Launching the Test

Once the test code is ready you can recompile the guest Linux kernel:

cd ~/virt-101-exercise/linux-6.6
make

When you launch the VM with this newly compiled kernel, you should see in the log at the end of the kernel boot process something like that:

[    3.519214] ------------------------------------------------------------------
[    3.519510] BEGIN MY-RNG TEST
[    3.519620] ------------------------------------------------------------------
[    3.520024] Round 0 number 0: 286129175
[    3.520046] Round 0 number 1: 1594929109
[    3.520199] Round 0 number 2: 971802288
[    3.520394] Round 0 number 3: 222134722
[    3.520559] Round 0 number 4: 1335014133
[    3.520754] Round 1 number 0: 286129175
[    3.520918] Round 1 number 1: 1594929109
[    3.521073] Round 1 number 2: 971802288
[    3.521227] Round 1 number 3: 222134722
[    3.521406] Round 1 number 4: 1335014133
[    3.521545] ------------------------------------------------------------------
[    3.521965] END MY-RNG TEST
[    3.522101] ------------------------------------------------------------------

As you can see for each round the series of random number generated are the same, which confirms that the RNG virtual device works well.

Implementing the Driver within the Guest Kernel

Here we will modify the guest Linux kernel again, this time we will implement a proper driver, manipulating the device and exposing its functionalities to the application from user land.

User-Kernel Space Communication with `ioctl`

The goal of an operating system (OS) is to provide user space application with safe and controlled access to the hardware. To that aim the OS implements a driver that manipulates the hardware directly, and that driver offers an interface to user space applications. It is possible to use different types of interfaces, such as implementing a new system call or using virtual files. The one that we will use for this exercise is called input/output control (ioctl).

With ioctl, the driver will create a virtual file on the VM's root filesystem representing the device, /dev/my_rng_driver. A user space application wishing to access our random number generator device will open that file and perform operations on it through a particular system call, ioctl, as illustrated below:

As the hardware device provides 2 functionalities, there will be 2 ioctl operations available: one to generate a new random number, and another to seed the RNG.

Adding a Source File to the Linux Kernel Sources

Our driver should be implemented in its own source file. So first we need to create a new C file in the kernel sources and add it to the build system so that it gets compiled with the rest of the kernel sources. To do so, let's first navigate to the kernel sources directory and create a file in the drivers/misc/ directory:

cd ~/virt-101-exercise/linux-6.6
touch drivers/misc/my-rng.c

Next let's add it to the build system. Edit drivers/misc/Makefile and add this line at the top of the file:

obj-y += my-rng.o

This indicates the kernel build process to add my-rng.c into every build of the kernel.

Implementing the Driver

In drivers/misc/my-rng.c, let's start by including the necessary headers:

#include <linux/ioctl.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/uaccess.h>
#include <linux/io.h>

These will let us define icotl operations, create the virtual file corresponding to the device, map the physical memory corresponding to the device's register into virtual memory, and access these registers.

Next we define the two ioctl operations our driver will support:

#define MY_RNG_IOCTL_RAND _IOR('q', 1, unsigned int)
#define MY_RNG_IOCTL_SEED _IOW('q', 1, unsigned int)

With _IOR we define an ioctl operation MY_RNG_IOCTL_RAND that will allow the application to read data from the device's RNG register, i.e. to get a random number. With _IOW we define an operation for the application to write data to the device's SEED register, i.e. to seed the random number generator. The parameters are not particularly important, but note that the last one specifies the size of what is read/written: an unsigned int, i.e. a 32-bit unsigned integer.

Next we define a macro with the base physical address where the device's registers are mapped into memory:

#define DEVICE_BASE_PHYS_ADDR 0xfebf1000

Please note that this value may be different on your computer. To find the proper value you can use lspci -v within the VM as previously explained.

We also need a pointer that will hold the location where the device's registers are mapped in virtual memory (recall that the CPU can only access virtual memory):

void *devmem = 0x0;

Next we implement the functions that will access the device. These will be called when the user land application invokes ioctl on the virtual file /dev/my_rng_driver.

static long my_rng_ioctl(struct file *file, unsigned int cmd, unsigned long arg) {

    switch (cmd) {
        
        case MY_RNG_IOCTL_RAND:
            /* Application requests a new random number */
            /* TODO implement that feature */

            break;

        case MY_RNG_IOCTL_SEED:
            /* Application requests to seed the RNG */
            /* TODO implement that feature */
            break;

        default:
            return -ENOTTY; // unknown command
    }

    return 0;
}

static struct file_operations my_rng_fops = {
    .unlocked_ioctl = my_rng_ioctl,
};

Here the cmd parameter contains the exact ioctl command that was called by the application. With a switch we separate the processing according to what the application requests, either MY_RNG_IOCTL_RAND or MY_RNG_IOCTL_SEED.

It will be your responsibility to implement these commands. A few important things to note:

When either reading or writing data from/to the device through ioctl, the parameter arg will contain:
- The address in user space of the data to write to the device in case of a write operation.
- The address in user space of where to store the data to read in case of a read operation.
It is unsafe to read/write from/to user space addresses directly (the user space application could have for example passed the kernel NULL pointers). To properly access these addresses, you need to use:
- copy_to_user when copying data read from the device into user space memory; and
- copy_from_user when reading data from user space in order to write it to the device.
At that stage you can assume that memory pointed by devmem has already been properly mapped somewhere in virtual memory by the driver initialisation function (presented below) and you don't need to call ioremap in my_ioctl.
You should access the device by taking inspiration from the test code we wrote in the previous step.

The handler my_ioctl is wrapped into a file_operations data structure that we will use to indicate the operations possible on the virtual file the driver will create in /dev.

Finally, we can implement the initialisation and destruction functions for our driver:

static int __init my_rng_driver_init(void) {
    devmem = ioremap(DEVICE_BASE_PHYS_ADDR, 4096);

    if(!devmem) {
        printk(KERN_ERR "Failed to map device registers in memory");
        return -1;
    }

    if (register_chrdev(250, "my_rng_driver", &my_rng_fops) < 0) {
        printk(KERN_ERR "Failed to register my_rng_driver\n");
        return -1;
    }

    printk("my_rng_driver loaded, registered ioctls 0x%lx (get a random "
        "number) and 0x%lx (seed the generator) \n", MY_RNG_IOCTL_RAND,
        MY_RNG_IOCTL_SEED);
    return 0;
}

static void __exit my_rng_driver_exit(void) {
    unregister_chrdev(250, "my_rng_driver");

    if(devmem)
        iounmap(devmem);

    printk(KERN_INFO "my_rng_driver unloaded\n");
}

module_init(my_rng_driver_init);
module_exit(my_rng_driver_exit);

The initialisation function my_rng_driver_init is executed when the kernel boots. It starts by mapping the device's registers into virtual memory with ioremap, as we have seen in the test code we wrote previously. Next it registers a character device named my_rng_driver into the kernel, with an identification number (called major number) of 250. We'll use that number later when we create in the VM the virtual file that will play the role of interface between a user space application and the driver living in the kernel.

The driver exit function my_rng_driver_exit is executed when the kernel shuts down. It simply unregisters the character device, and unmaps the device register's from virtual memory.

The initialisation and exit functions are indicated with module_init and module_exit.

Once all the code is written, you can recompile Linux by typing at the root of its sources:

make

Checking the Presence of the Driver

Once the kernel is recompiled, reboot the VM and check in the kernel log the line written by the driver when it loads. It may be a bit hard to find because a lot of stuff is printed on that log when the kernel boots. Once you get a shell you can print and filter the kernel log with dmesg and grep:

dmesg | grep my_rng_driver
[    0.869353] my_rng_driver loaded, registered ioctls 0x80047101 (get a random number) and 0x40047101 (seed the generator)

Note that the ioctl numbers may be different on your machine.

Accessing the Device From User Space

This is the final step of the guided part of this exercise. We will now develop a simple user space application that accesses the virtual device through the driver we just implemented.

Connecting via SSH from the Host to the VM

In this part you may need to edit files within the VM, and possibly transfer files between the host and the VM. You will notice that Qemu's virtual serial output (the console you get in the terminal after starting the VM) is not very stable when you type long commands (> 1 line of terminal), and that text editors also struggle to display things correctly in the VM. To get access to a stable console, it is better to rely on an SSH connection from the host to the VM.

With the simple virtual network we are using for that exercise (the -nic user option of Qemu), the host and the guest don't see each other directly, but we can use the following trick: Qemu's networking can be used to forward the SSH port of the VM on a given port p on the host. Once this is done, by connecting via SSH from the host locally on p, we end up in the VM.

To forward the VM's SSH port (22) to a port on the host, i.e. 1022, change the -nic option of Qemu in your VM launch script of Qemu to the following:

-nic user,hostfwd=tcp::1022-:22

Launch the VM, and wait for it to boot. Then, from the host, connect via SSH to the local port 1022 in a new terminal:

ssh root@localhost -p 1022
root@localhost's password: 
Welcome to Alpine!

You are now in the VM. You can also use scp to transfer files between the host and the VM, and vice versa. These transfers need to be initiated from the host. For example to transfer a file from the host to the VM:

scp -P 1022 /path/to/local-file-on-the-host.txt root@localhost:/path/to/destination/on/the/vm

And to transfer a file from the VM to the host:

scp -P 1022 root@localhost:/path/to/source-file-on-the-vm.txt /path/to/destination/on/the/host

Note that for ssh we indicate the port with p, while for scp it is done with P, which is not particularly intuitive.

Creating the Virtual File for the Device

Before we can write the user space app that will connect to the device through the driver, we need to create the virtual file /dev/my_rng_driver mentioned in the previous step. To do so, type the following command within the VM:

mknod /dev/my_rng_driver c 250 0

The major number, here 250, must match the one you defined within the driver in the initialisation function. After invoking mknod the virtual file should be present in /dev:

ls -l /dev/my_rng_driver 
crw-r--r--    1 root     root      250,   0 Dec 20 22:32 /dev/my_rng_driver

You will need to repeat that operation each time the VM reboots. To avoid doing so, you can configure Alpine to automatically create the virtual file each time the VM boots by creating a file (in the VM) /etc/init.d/init-my-rng-virtual-file and placing the following in it:

#!/sbin/openrc-run

mknod /dev/my_rng_driver c 250 0

Then giving that file execution permissions:

chmod +x /etc/init.d/init-my-rng-virtual-file

Installing a C Toolchain and a Text Editor in the VM

We will next write the user space application. You can either write it on the host and transfer the source file to the VM, or write it directly within the VM In both cases the application's source file will need to be compiled in the VM. You can install the text editors vim and nano, as well as the build toolchain (C compiler, etc.), with the Alpine package manager. To do so, run the following command inside the VM:

apk add build-base vim nano

You can now use vim or nano to edit files, and use gcc to compile C programs in the VM.

Writing the User Space Application

The source code of the user space application follows. We start by including a few headers for printing to the standard output, accessing files, and performing ioctl commands.

#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>

Next we have two constants that are the ioctl numbers that were allocated for the 2 functions offered by the driver. To find them look in the VM's kernel log.

#define RAND_IOCTL	0x80047101
#define SEED_IOCTL	0x40047101

Finally, we have the main function that contains our test code:

int main() {
    int fd = open("/dev/my_rng_driver", O_RDWR);
    if (fd < 0) {
        perror("Failed to open the device file");
        return -1;
    }

    unsigned int seed = 0x0;
    unsigned int random_number = 0;

    for(int i=0; i<2; i++) {

        // seed the generator
        if(ioctl(fd, SEED_IOCTL, &seed)) {
            perror("ioctl seed");
            return -1;
        }

        // get 5 random numbers
        for (int j=0; j<5; j++) {
            if(ioctl(fd, RAND_IOCTL, &random_number)) {
                perror("ioctl rand");
                return -1;
            }

            printf("Round %d number %d: %u\n", i, j, random_number);
        }
    }

    close(fd);
    return 0;
}

This code starts by opening the virtual file representing the driver, /dev/my_rng_driver. It then follows similar steps to our in-kernel test we ran earlier: we seed the RNG, and generate 5 random numbers. We do that twice in a row to confirm that with the same seed, the device will return the same sequence of random numbers. Notice how ioctl is called with as parameter:

The virtual file descriptor fd
The ioctl code we want to invoke (RAND_IOCTL or SEED_IOCTL)
The address of a variable that will be filled with the random number generated (for RAND_IOCTL), or the address of a variable holding the seed we want to use (for SEED_IOCTL).

You can compile that code within the VM, assuming you write it in a file named my-app.c as follows:

gcc my-app.c -o my-app

When launching the program, you should see a series of 2 similar random number sequences:

./my-app
Round 0 number 0: 1804289383
Round 0 number 1: 846930886
Round 0 number 2: 1681692777
Round 0 number 3: 1714636915
Round 0 number 4: 1957747793
Round 1 number 0: 1804289383
Round 1 number 1: 846930886
Round 1 number 2: 1681692777
Round 1 number 3: 1714636915
Round 1 number 4: 1957747793

That's it! We have reached the end of the guided part of this exercise. Now the next step is to enhance the device/driver. There are various ways to achieve that, and it is up to you to choose an avenue of improvement. A few suggestions are given in the next and last step of this guide.

Going Further

The last part of the exercise is to identify a limitation in the current prototypes of the emulated device and its driver, or to develop an entirely new functionality. This is really up to you, but you can find below a few suggestions:

Building the driver as a kernel module
Automatic base address discovery
Improving the performance of the device/driver
Implementing other random number generators
What else can you think of?

ChatGPT, GitHub Copilot, and other generative AI tools. Using such tools for that exercise is encouraged. In fact, they were very helpful in preparing it!

Once you are done make sure to submit your exercise.

Building the Driver as a Proper Kernel Module

A kernel module is a piece of kernel code that is not automatically loaded at boot time, but that can rather be dynamically loaded and unloaded at runtime. In addition to that added flexibility, a module can also be compiled in a separate source folder outside of the kernel's source tree, which is practical for e.g. version control: you don't need to have an entire clone of the kernel's sources and your module can live in its own git repository. Compilation is also faster, as you don't have to link the module with the rest of the kernel.

You can find a good guide on how to write a "hello world" kernel module here. See if you can take inspiration from it and translate the driver into a kernel module.

Automatic Base Address Discovery

Currently, the base address of the device is hardcoded as a constant into the driver's code:

#define DEVICE_BASE_PHYS_ADDR 0xfebf1000

This is not good because the device's base address in physical memory can change when rebooting the machine (e.g. when some hardware is added/removed). A possible solution would be to compile the driver as a loadable kernel module and pass the base address (that can be at load time discovered with lspci -v) as a module parameter.

This solution is not great still because it forces the user to call lspci and fill in the parameter each time the driver is loaded. Another solution would be to have the driver enumerate PCI devices itself, and automatically find the base address, based on the vendor and device IDs we defined when implementing the virtual device in Qemu. You can see with lspci that for our virtual random number generator the vendor ID is 0x1234 and the device ID is 0xcafe.

Measuring and Improving the performance of the Device/Driver

Should you write a user land application trying to generate through the device as many random numbers as possible, you will find that the throughput is not very high. This is due to the latency of security domain crossings: going from user to kernel space and back when the application calls the driver in the guest is costly in terms of CPU cycles, so is going from the guest to the host and back when the virtual device is accessed by the driver.

To address that issue, a first solution would be to increase the size of the random numbers produced by the device. Switching from 32 to 64 bits should be relatively straightforward, and would hopefully increase the random data generation throughput.

Another possibility would be to consider generating a much larger amount of random data for each call to the device, and implement the transfer of that data to the driver via DMA. This is no easy task, but to achieve it you could take inspiration from the Qemu educational device presents in Qemu sources in hw/misc/edu.c

Implementing Other Random Number Generators

Within the implementation of the RNG virtual device, we currently use the C standard library functions rand() and srand() to generate random numbers and seed the generator. This is perfectly fine but, for the sake of the exercise, one could switch this implementation with something else: Rather than relying on the standard C library, you could implement manually a RNG algorithms. Some are very simple, and others are more complicated, all with their pros and cons. More information here.

Generating the Patches for Submission

To generate the Qemu/Linux patches you need to submit, we will use git to compute the code modifications you brought on top of the vanilla versions (tag v8.2.0 for Qemu and v6.6 for Linux).

First, add any file modified/added to the version control system, for example for Qemu, after completing the first (guided) part of the exercise, we have the following files modified/added:

cd ~/virt-101-exercise/qemu-8.2.0/

git status
Not currently on any branch.
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   hw/misc/Kconfig
	modified:   hw/misc/meson.build
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	hw/misc/my-rng.c
	prefix/

Add the necessary files (do not add the prefix/ directory, which contain the binaries resulting from the build):

git add hw/misc/Kconfig hw/misc/meson.build hw/misc/my-rng.c
git commit -m "guided part of the exercise done"

Make sure to also add any other file (e.g. additional source files if you added some) that may be needed to build your solution when marking. Git may ask you to indicate your name/email to identify your commit.

You can then generate the patch by computing the difference with the tag we used as base:

git diff v8.2.0 > qemu.patch

Proceed similarly for Linux.

Note that git diff will produce a patch containing the difference between the commit/tag passed as parameters (here v8.2.0) and the latest commit on the current branch, i.e. if you have unstaged modifications they will not be present in the patch. So make sure to commit anything you want included in the patch.

For the instructions on how to submit the patches and report, please see the logistics part at the beginning of this guide.

Virtualization 101 Lab Exercise