class: center, middle ### Secure Computer Architecture and Systems *** # Storage and Network:
Brief Introduction ??? - Hi everyone - Here we talk about the last 2 big features of operating systems we cover in this unit - Storage and networking - As for the previous topics we'll focus on Linux --- # Storage and Network - Computer: CPU + memory + I/O - Most widespread classes of I/O: storage and network - Storage and network are managed and accessed through stacked layers of software - Very brief introduction here - With once again a focus on Linux ??? - A computer is a CPU, some memory, as well as I/O devices - The two I/O devices you'll find in almost every computer are a secondary storage device, and a network card - There are some computers, like servers, that don't really have any other I/O apart from networking and storage - Same for many embedded devices no mouse, no keyboard, no screen BUT they still need storage for persistence, and access to the network for communication with the rest of the world - Here I'll do a brief introduction to how storage and networking work with Linux, I can't go into too many details for time reasons, but there is much to say about either storage or network - There are entire books focusing on each topic --- class: inverse, middle, center # Storage ??? - Let's start with the storage stack --- # The Storage Stack .leftcol[
] ??? - This is the Linux storage stack - You have the application on top, and each layer of the operating system involved in storage management, down to the hardware - You may have heard of the fundamental theorem of software engineering, which states that "we can solve any problem by introducing an extra level of indirection" - The storage stack is a good illustration of this - Let's go over these layers one by one --- # The Storage Stack .leftcol[
] .rightcol[ Application accesses files using **system calls**: `open`, `read`, `write`, `lseek`, etc. ] ??? - Like every other operating system service, applications access the storage using system calls - You know the filesystem-related ones: open, read, write, lseek, etc. --- # Virtual File System .leftcol[
] .rightcol[ The **Virtual File System** (VFS) abstracts all filesystems under a common interface (aforementioned syscalls) - Translated to concrete FS operations - Apps don't need to know on what FS sit the files they access - Can mount multiple FS in a single directory tree - Help manage cached data/metadata ] ??? - These systems calls are received by the topmost layer of the storage stack in the OS - It is called the Virtual File System, VFS - Its goal is to abstract all filesystems supported by Linux under a common interface, which is the set of system calls we just mentioned - These system calls are translated by VFS into concrete filesystem operations - VFS allows mounting multiple filesystem in a single directory tree - The benefits to having a unified interface to access files sitting on different filesystem are quite obvious: applications can be written independently of the filesystem used to store the files they access - Finally, VFS factorise a lot of storage management code that does not need to be implemented on a per-filesystem basis - This is particularly true for data and metadata caching code --- # File Systems .leftcol[
] .rightcol[ Concrete **File Systems**: - Manages how file/directory data and metadata are laid out on the storage support - Linux supports 10s of filesystem: - Disk-based (HDD/SSD), e.g. ext4 - Ram-based, e.g. ramfs/tmpfs - Pseudo filesystems, e.g. `/proc` - Network filesystem, e.g. NFS - Filesystems for other media: optical, flash chips, etc. ] ??? - Below VFS you have the actual filesystems - The filesystem's implementation defines concretely how file data and metadata is stored and retrieved from the storage device - Linux supports 10s of filesystems - Many target traditional storage devices such as hard disks and SSDs - But you also have RAM based filesystems - Pseudo filesystems that do not store any data, for example /proc and /sys - Network filesystems - As well as filesystems for other media such as optical disks, embedded flash chips, etc. --- # Page Cache .leftcol[
] .rightcol[ **Page cache** buffers all file data in RAM: - Data read is kept in RAM in case it needs to be read again - Data written is kept in RAM for some time before being flushed to disk in case it is overwritten - As long as there is free RAM, Linux uses it for file data - Try the command `free -h` and check out `buff/cache` ] ??? - Connected to the virtual and concrete filesystem layers, the page cache is the main file data cache in Linux - File data read is being cached in RAM in case it needs to be read again in the future - File data written is being cached in RAM for a bit of time to buffer short term bursts of write requests before flushing them to disk - Linux's policy on how much RAM to use for caching file data is simple: all the RAM that is not used by running programs and the kernel can be used to cache file data - The goal is to maximise the usage of your RAM - You can check how much of your RAM is used to cache file data with this command - It is common to see gigabytes of RAM being used for that goal --- # The Block Layer .leftcol[
] .rightcol[ **Block layer** manages requests to disk-like devices: - Devices like HDDs or SSDs, accessed at *block* granularity - Relatively large, e.g. 1 disk sector of 512 bytes - Implement **I/O schedulers** that queue and reorder request to maximise performance - E.g. to avoid seeking on a HDD ] ??? - Below the filesystem you have the block layer - It's another indirection layer that abstracts block devices - Block devices are storage devices accessed at relatively large granularities - For example most hard disks are accessed at the granularity of a sector, which is 512 bytes - In addition to providing a common interface for all block devices, the block layer implements I/O request schedulers that will queue, reorder, merge, or split requests to maximise performance, for example by avoiding to move the magnetic head of a hard disk which is a very costly operation --- # Device Mapper .leftcol[
] .rightcol[ **Device mapper**: optional layer to create virtual block devices on top of physical ones, to implement: - Encryption - Virtual partitions (LVM) - Compression - Caching - Disk aggregation and mirroring (RAID) - Etc. ] ??? - The block layer also allows creating virtual block devices on top of physical ones, to implement in software some features not always supported by the hardware, such as - Encryption, virtual partitions, compression, caching, aggregation of multiple disks for performance and fault tolerance reasons, etc. - This is achieved through a layer called the device mapper --- # The Storage Stack .leftcol[
] .rightcol[ - There are many filesystems targeting non-block devices - Issue request to various other subsystems, for example: - Network stack for NFS - Memory Technology Device for embedded flash chips - Etc. ] ??? - There are also quite a lot of filesystem that do not target block devices - Something like NFS will fetch the filesystem and propagate its modification through the network - You have an entire subsystem named memory technology device for the embedded flash chips that you can find on early smartphones - And so on --- # Low-Level Layers & Drivers .leftcol[
] .rightcol[ - There may be more abstraction layers/protocols at the lower levels - USB, SCSI, NVMe, SATA, etc. - At the lower level hardware devices are accessed through their **drivers** - 1 driver per model of device - Drivers make up most of Linux' code ] ??? - Between the block layer and the driver you may have more abstraction and protocol layers, things like USB, SCSI, SATA or NVMe - And then we have the driver which is the lowest level of software in the operating system - It's in charge of sending to the device the I/O requests submitted by the higher layers - Generally you have one driver per model of device, and Linux supports a very high number of devices - in fact, more than 2 thirds of the 20 million lines of code that make up the kernel corresponds to device driver code --- # VFS Data Structures .leftcol[
] .rightcol[ - The kernel maintains in RAM a set of **data structures to handle filesystem operations** - Some are created when the partition is mounted, others on-demand when e.g. a file is opened by a process - VFS asks the concrete filesystem to construct these data structures from on-disk metadata ] ??? - Let me zoom in a little bit on the VFS layer, which is the filesystem abstraction layer that handles system calls from user space directly - It uses a series of data structures to handle filesystem operations - Some are created when a disk partition hosting a particular filesystem is mounted - Others are created on demand when filesystem objects are accessed, for example when a file is opened - These data structures are used by VFS, and they are generally created by the concrete filesystem itself because only the filesystem knows how file data and metadata is stored on the storage device, VFS is just an abstraction layer --- # VFS Data Structures .leftcol[
] .rightcol[ Superblock: global **information and operations about the filesystem** (partition) - Filesystem type, mount flags, quotas, mount point, associated devices, etc. - Expose methods to flush caches, unmount the filesystem, etc. ] ??? - A first interesting object is the superblock - There is one instance of this object per mounted filesystem, here by filesystem I mean partition - The superblock contains general information about the partition, such as the filesystem type, mount flags, etc. - It exposes a series of methods to execute partition-level operations such as flushing caches or unmounting the partition, etc. --- # VFS Data Structures .leftcol[
] .rightcol[ **Inode object** holds/manipulate metadata about a particular file or directory - File size, owner id, permissions, etc. - Expose methods to create, delete, resize, move files, etc. - Held by the kernel in RAM in the **inode cache** ] ??? - A key data structure is the inode - You have one inode object per file or directory on the filesystem - It contains metadata about the file, such as its size, owner, and permissions - Inode files are created by the concrete filesystem and buffered in RAM in what is called the inode cache - The expose methods to perform file-level operations: creating, deleting, resizing, moving files, etc. --- # VFS Data Structures .leftcol[
] .rightcol[ **Dentry object** holds/manipulate specific metadata about a particular file or directory - File name - File location in the directory tree - Exposes operations on this metadata e.g. pathname lookup - Held in RAM in the **dentry cache** - 2 dentries can correspond to the same inode - Hard links, a file can have different names/locations ] ??? - Things like a file name and location in the directory tree are not contained in the inode - Instead they are contained in a dedicated data structure which is called dentry - Dentries are used for operations on the directory tree, such as pathname lookup or listing the content of a directory, and they expose methods to accomplish these operations - Dentries are buffered in RAM in what is called the dentry cache - There is at least one dentry per file or directory in the filesystem, and there can be more than one in case you create hard links --- # VFS Data Structures .leftcol[
] .rightcol[ **File object** represent an instance of a file opened by a process - Each file descriptor in user space corresponds to a file object in the kernel - Holds metadata such as open flags, or file offset - Exposes methods to access the file: read, write, lseek, etc. ] ??? - Finally, the file object represents an instance of a file opened by a process - Each file descriptor used by the program corresponds to a file object in the kernel, so you can have several file objects for the same file if that file is opened multiple times by the program - The file object holds metadata about the opened file, for example the file offset or the flags it was opened with - The file object exposes methods to access the files: reading it, writing it, etc. --- class: inverse, middle, center # Networking ??? - Let's now very briefly talk about the network stack --- name: network # The Network Stack .leftcol[
] ??? - Of course once again this is a very brief, high level overview - You could probably dedicate an entire course unit to each the layers composing modern network stacks - But we don't have time to go into many details here --- template: network .rightcol[ - **Applications** access the network through the socket interface - System calls that allow to connect to other applications through the network and send/receive data - **Transport layer** breaks data into packets and assemble them, handles reliability, ordering, flow control - **TCP**: reliable, ordered, and connection-oriented protocol ] ??? - Similar to all other OS services, applications access the network through system calls, such as socket, connect, listen, etc. - These network-related system calls form the socket interface - They let an application create server listening on a port, and sending/receiving data with system calls that are akin to writing and reading to files - Below the socket interface we have the transport layer, implementing transport protocols -- the main ones being TCP and UDP - This layer splits or assemble data to send and receive into packets and handles things like reliability, ordering, and flow control --- template: network .rightcol[ - **Network layer** handles packet delivery: addressing and routing - **IP**: a widely used protocol - **Link layer** handles communication between devices on the network - Physical (MAC) addressing - Driver for the network card ] ??? - Below that you have the network layers, which is in charge of figuring where each packet should go - So it takes care of addressing and routing - Here generally the protocol used is IP - And finally you have the link layer that handles things like physical addressing, and that also contains the network card driver --- # Networking: Going Further - Rami Rosen, **Linux Kernel Networking Implementation and Theory** - Christian Benvenuti, **Understanding Linux Network Internals** ??? - As mentioned previously this is just a very brief overview of the network stack - If you want to dig deeper, feel free to check out these books --- # Summary - Storage and network are among the most complex forms of I/Os managed by the kernel - Complexity is handled by multiple levels of software abstraction layers forming **I/O stacks** - Storage: VFS -> FS -> Block -> Driver - Network: Socket -> Transport -> Network -> Link -> Driver - These layers impact performance (e.g. latency) - High-I/O performance OSes let the application access disk/network card directly and bypass the OS ??? - To sum up, here we covered and overview of the 2 main types of I/Os: storage and networking - There are among the most complex forms of I/Os managed by the kernel, and because of that they are organised in stacks, made out of multiple layers of indirection, each with a particular set of responsibilities - From top to bottom, meaning from the application down to the hardware, the storage stack is made of VFS, the concrete filesystem, the block layer, and the storage device driver - And the network stack comprises the socket interface, transport layer, network layer, and the link layer that includes the NIC driver - Obviously multiplying the number of indirection layers like that has an impact on performance, especially on latency - This is why the high-performance I/O stacks and dedicated OSes try to reduce the amount of layers and let the application access disk and network directly, bypassing the OS