Objectives and Logistics

The goal of this lab exercise is to apply the software compartmentalisation theoretical concept covered in the unit (see Week 5 content here).

We'll first focus on a simple example: a web server containing a heartbleed-style vulnerability. We will sandbox the part of the program containing the bug within its own compartment, and check that this compartmentalisation prevent an attacker from leaking secrets allocated outside that compartment.

Next, we will compartmentalise a real-world library, TinyExpr. TinyExpr is a mathematical expression parser and evaluator library, that can be integrated in C and C++ programs. Parsers are particularly prone to vulnerabilities because they manipulate untrusted input (the content to parse) that may be malformed in many ways. We will sandbox TinyExpr within its own compartment, and study this compartmentalisation's impact on performance.

Submission Instructions

The deliverables for this exercise are the C source code for the compartmentalised versions of the web server and TinyExpr. The submission is made through the CS Department’s Gitlab. You should have a fork of the repository named 60261-lab3-s-compartmentalisation_<your username>. The sources should be grouped in folders located at the root of the repository, one per relevant part of the exercise:

heartbleed-guided/
heartbleed-advanced/
tinyexpr-ipc/
tinyexpr-shm/

Submission details are given in the relevant parts of this exercise. To indicate that the submission is ready to be marked create a tag named lab3-submission.

The deadline for this assignment is Friday 14/11 2pm London time.

A few important points regarding the submission:

⚠️ Make sure you push to the precise repository mentioned above and not another one (do not fork it or create a new repo), and to tag your submission properly.

⚠️ The submission is to be made through GitLab only, there is no need to submit anything to Canvas.

⚠️ You need some basic knowledge of git and GitLab to submit that exercise. In the unlikely case you are not familiar with these tools, see some guidance here.

Failure to follow these instructions is likely to result in a mark of 0 for this exercise.

For any issues or questions, feel free to get in touch with the instructor through the discussion board on Canvas or during office hours (see the schedule on Canvas for the their time and location. You can also contact your student representatives.

High-Level Marking Scheme

Part	Marks
Compartmentalising HeartBleed (Guided)	/5
Compartmentalising HeartBleed (Advanced	/5
Compartmentalising TinyExpr (IPCs)	/5
Compartmentalising TinyExpr (Shared Memory)	/5
Total:	/20

Intended Learning Outcomes (ILOs)

By the end of this lab, students will be able to:

Design and implement compartmentalisation policies in C programs, using process-level isolation, and IPC-based cross-compartment communications
Demonstrate the security benefits of these policies
Assess and understand the performance impact of these approaches

Required Setup

This exercise requires the same setup as for lab 1.

Guided Example: Compartmentalising a HeartBleed-style Vulnerability

Understanding the Code Base

Consider this program:

// Includes omitted

void privileged_function() {
    printf("Privileged code running!\n");
    // ... do some privilege stuff here e.g. update the server's configuration
    return;
}

int main(int argc, char **argv) {
    char admin_pw[64] = "supersecret"; // admin password
    unsigned char buf[32];
    int opt = 1;

    if(argc == 2 && !strcmp(argv[1], "login")) {
        // Attempt at admin login

        // Get the password attempt
        printf("Please enter the admin password: ");
        char attempt[128];
        fgets(attempt, 128, stdin);

        // remove carriage return
        attempt[strlen(attempt)-1] = '\0';

        // check if the password is correct
        if(!strcmp(attempt, admin_pw))
            privileged_function();
        else {
            printf("Admin authentication failed!\n");
            return -1;
        }
    }

    // Setup the server to listen to port 12345
    int server = socket(AF_INET, SOCK_STREAM, 0);
    setsockopt(server, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {
        .sin_family = AF_INET,
        .sin_port = htons(12345),
        .sin_addr.s_addr = INADDR_ANY
    };

    bind(server, (struct sockaddr*)&addr, sizeof(addr));
    listen(server, 1);

    // Wait for a client's request
    int client = accept(server, NULL, NULL);

    // read the request into buf
    recv(client, buf, sizeof(buf), 0);

    // Heartbleed-style vulnerability:
    // client sends: [type][len][data] -> respond with `len` bytes
    int len = buf[1];  // vulnerable: no bounds check

    // Send response
    send(client, buf + 2, len, 0);

    // Cleanup
    close(client);
    close(server);

    return 0;
}

This is a variation of the HeartBleed-style vulnerability we covered in the memory safety part of the unit. The program has an administrator mode which can be enabled by having a local user enter the proper password from the standard input. When authenticated, the administrator can run security-critical code (e.g. to update the server's configuration), here represented by privileged_function. As you can see the password is in clear: this is obviously a terrible security practice, we use it here for the sake of the example.

The server listens on the port 12345 for a request from a client, reads it, and sends a response to the client. The client sends the request to the server with the following format:

The first byte of the request indicates the request type (here it's not relevant because it's not processed by the server).
The second byte of the request indicates the size of server response expected by the client.
After that the next bytes indicate what the server should respond: that content should have a size in bytes equal to the value present in the second byte.

Building and Running the Server

Download the server's source file, then in a terminal compile and run the server:

gcc server-monolithic-v1.c -o server
./server

In a separate terminal we can connect to the server as a client. Under normal operation:

printf '\x01\x02hi' | nc localhost 12345
hi

Here we send a request with request type 0x01, expected response size 0x02 (2 bytes), and expected response hi (which size is indeed 2 bytes). The server replies hi, it's all good.

Exploiting Our Server

The vulnerability in the server code is that, when a client sends a request with a value in the second byte that is larger than the size of what is contained in bytes 3+, the server sending the response with send will overflow buf and leak some of its data to the client. So here our threat model is a remote attacker (a client) leveraging that vulnerability to leak the admin password.

Restart the server. Leaking the password is quite easy: the client can indicate a very large expected response size, and a small expected response, for example:

printf '\x01\x90hi' | nc localhost 12345
hiasupersecret��\k����Q�"��:V@%

As you can see we have overflown buf which is located on the stack. password is nearby, and gets leaked to the remote attacker.

Compartmentalisation Policy

The fix for this bug is easy: the server can put a cap on the size of what the client expects as response, in such a way that buf will never be overflown. However, here our goal is to study Compartmentalisation. To that aim we are not going to fix the bug. We'll rather leave it there and see how compartmentalisation can prevent the password from leaking. This illustrates the fact that compartmentalisation protects program against vulnerabilities that are unknown, or that do not even exist yet.

We want to break down our server and sandbox the request processing code containing the bug as follows:

We want to create two compartments:

Compartment 1 contains the program entry point, the authentication code (including the password we don't want to leak), and the privileged code that runs when an administrator successfully logs in.
Compartment 2 contains the network code that receives, processes, and sends requests.

With the isolation enforced by compartmentalisation, our goal is that an attacker exploiting the vulnerability in compartment 2 won't be able to leak the password because it lives in the other compartment.

Our compartmentalised server will have 2 binaries, one for each compartment. Its execution flow is numbered on the picture above:

The server is launched by running compartment 1's binary. If needed, compartment 1 authenticates the administrator against the password and run the privilege code.
Compartment 1 invokes compartment 2's binary with fork() + execve(). Compartment 2 sets up networking and start to listen on port 12345.
Compartment 2 receives a request from a client.
Compartment 2 responds to the client.

Compartmentalising Our Server

Split the monolithic server's code into two source files: comp1.c for compartment 1, and comp2.c for compartment 2:

Place the authentication and privileged code in comp1.c. After the authentication code has run, compartment 2 should be invoked: for that use fork() and execve() as seen in the lecture. Make sure to remove any code and data (variables) that is no longer needed in compartment 1. Have compartment 1 wait for the termination of compartment 2 before exiting.
In compartment 2 create a main function and place the networking code in there. Make sure to delete any code or data that is no longer needed in compartment 2.

Validating Functionality & Protection

The compartmentalised server should behave similarly to the monolithic version under normal operation:

gcc comp1.c -o comp1
gcc comp2.c -o comp2

./comp1

In a separate terminal:

printf '\x01\x02hi' | nc localhost 12345
hi

When an attacker tries to trigger the bug, the password should not be present in the bytes that leak:

./comp1

In a separate terminal:

hi��C�H��L#�H���T��q��>V@`�����T����T�}��@hJ_���T�`E�H؝��>V}�

Submission Instructions

Submit the source code of each compartment as two separate files grouped in a folder in the git repository:

heartbleed-guided/comp1.c
heartbleed-guided/comp2.c

Advanced HeartBleed Compartmentalisation

Server V2

Now considered this updated version of our vulnerable monolithic server:

// includes omitted

void privileged_function(char *heartbeat) {
    printf("Privileged code running!\n");

    // Set the heartbeat word
    printf("Enter the new value for the heartbeat word to send to the clients: ");
    fgets(heartbeat, 32, stdin);

    // remove carriage return
    heartbeat[strlen(heartbeat)-1] = '\0';

    // more privilege operations...

    return;
}

int main(int argc, char **argv) {
    char admin_pw[64] = "supersecret";
    unsigned char buf[32];
    char heartbeat[32];
    int opt = 1;

    // initialise the heartbeat word to a default value
    strcpy(heartbeat, "heartbeat");

    if(argc == 2 && !strcmp(argv[1], "login")) {
        char attempt[128];

        printf("Please enter the admin password: ");
        fgets(attempt, 128, stdin);

        // remove carriage return
        attempt[strlen(attempt)-1] = '\0';

        if(!strcmp(attempt, admin_pw))
            privileged_function(heartbeat);
        else {
            printf("Admin authentication failed!\n");
            return -1;
        }

    }

    int server = socket(AF_INET, SOCK_STREAM, 0);
    setsockopt(server, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));

    struct sockaddr_in addr = {
        .sin_family = AF_INET,
        .sin_port = htons(12345),
        .sin_addr.s_addr = INADDR_ANY
    };

    bind(server, (struct sockaddr*)&addr, sizeof(addr));
    listen(server, 1);

    int client = accept(server, NULL, NULL);

    recv(client, buf, sizeof(buf), 0);

    // Heartbleed-style vulnerability:
    // client sends: [type][len][data] -> respond with `len` bytes
    int len = buf[1];  // vulnerable: no bounds check

    // Send back the heartbeat word
    send(client, heartbeat, len, 0);

    close(client);
    close(server);

    return 0;
}

You can download this source file here. In this version, the content of heartbeat response sent back to the client can be defined by a successfully authenticated administrator: it is taken from the command line in privileged_function after authentication:

gcc server-monolithic-v2.c -o server
./server login                                                                                                                                                          130 ↵
Please enter the admin password: supersecret
Privileged code running!
Enter the new value for the heartbeat word to send to the clients: hello

In a separate terminal:

printf '\x01\x05hi' | nc localhost 12345
hello

The server is still vulnerable to leaking the password: you can try it out yourself by sending a request with a large size as we saw previously.

Note that here the way the heartbeat system works is quite silly: the content of response is now defined by the server, with its size defined by the client. This is once again done for the sake of the example, but note that this kind of bugs may happen when you have many developers working on a rather complex code base.

Compartmentalising the Server V2

The goal here is to compartmentalise the server with the same policy as for the previous part of the exercise: the authentication and privileged code in compartment 1, and the networking code in compartment 2. Split the code into comp1.c and comp2.c as done previously. You will notice that more work is needed: indeed there is the need for communication between compartment 1 and 2: compartment 1 is where the heartbeat word is set, and its value must be transmitted to compartment 2, so it can send it to the client through the network.

Cross-Compartment Communication: Command Line Arguments

The easiest way to pass the heartbeat word from compartment 1 to compartment 2 is through a command line argument. Modify compartment 2 to accept that string as first command line parameter. Compartment 1 must also be modified to invoke compartment 2 with the heartbeat word as first command line parameter, that is quite simple to achieve with execve().

Validate that your solution behave similarly vs. the monolithic version of the server V2 under normal operation. Then, try to trigger the exploit and see what is sent back to the client by the server: the leaked bytes should not contain the password value.

Cross-Compartment Communication: Pipe

The command line argument is good to pass a small amount of string-like data between compartments, but if one wants to pass more complex data structure, or a larger amount of content, a proper IPC mechanism needs to be used. Implement a second version of the compartmentalised server, this time using a named pipe (FIFO) to send the heartbeat word between compartment 1 and compartment 2.

Validate the behaviour of your solution under normal operation, and when an attacker triggers the overflow.

Submission Instructions

For both alternatives of the compartmentalised server (command line arguments and pipe-based communication) submit the source code of each compartment as two separate files grouped in a folder in the git repository:

heartbleed-advanced/comp1-cmdline.c
heartbleed-advanced/comp2-cmdline.c
heartbleed-advanced/comp1-pipe.c
heartbleed-advanced/comp2-pipe.c

Compartmentalising Tinyexpr

Introduction

TinyExpr is a mathematical expression parsing and evaluation engine. It is available as a C library that can be integrated in C/C++ projects to provide mathematical expression evaluation features. In this part of the exercise, we will integrate a simplified version of TinyExpr in various programs and sandbox it within its own process.

Sandboxing TinyExpr makes sense from the security point of view. First, it manipulates input (mathematical expressions to evaluate) that may come from untrusted sources, such as the command line, files, the standard input, etc. Second, these expressions must be parsed by TinyExpr before being evaluated: as seen during the lecture, parsers can be quite complex and particularly prone to bugs, which combined with the fact that they often manipulate untrusted input, is concerning.

Our Simplified Version of TinyExpr

We will be working on a simplified version of TinyExpr, which support only the evaluation of mathematical expression in interpreted mode, without any variable. These features make compartmentalising TinyExpr much more complex, something we want to avoid.

Download the source code of our simplified TinyExpr here, and unzip the archive somewhere on your file system. The sources are made of the following files:

tinyexpr.c: the library's implementation (you don't need to study that code in details).
tinyexpr.h: a header file describing the interface exposed by the library.
example.c: a minimal program using the library to evaluate a particular mathematical expression.
example2.c: a small program using the library to evaluate mathematical expression fed through the command line.
test-suite.c: a test suite checking that the library's behaviour is correct over a few mathematical expressions.
benchmark.c: a benchmark measuring the speed of the library to evaluate a particular mathematical expression.

TinyExpr: Exposed Interface

To use TinyExpr, a program's C code must include tinyexpr.h and make use of the interface exposed in that header:

#ifndef TINYEXPR_H
#define TINYEXPR_H

#ifdef __cplusplus
extern "C" {
#endif

/* Parses the input expression, evaluates it, and frees it. */
/* Returns NaN on error. */
double te_interp(const char *expression, int *error);

#ifdef __cplusplus
}
#endif

#endif /*TINYEXPR_H*/

The interface is very simple: a program must call the exposed function te_interp to evaluate a mathematical expression, which is passed as the first (string) parameter. The second parameter is a pointer to an integer that will contain something different from 0 in case of error. If all goes well, the function returns a double which contains the result of the expression's evaluation.

The interface exposed by TinyExpr gives us an idea of what data transfers will need to happen when it is compartmentalised and sandboxed within its own process:

The parameters pointed by te_interp will leave within the main program's process, and will need to be transferred within TinyExpr's compartment.
After the function runs, the double result will need to be transferred back to the main program's compartment.

Building Programs with the `Makefile`

The Makefile describes build rules and dependencies for each of the executables: the two minimal examples, the test suite, and the benchmark. You can rebuild everything by typing in a terminal:

make
gcc -g -Wall -o example example.c tinyexpr.c -lm
gcc -g -Wall -o example2 example2.c tinyexpr.c -lm
gcc -g -Wall -o test-suite test-suite.c tinyexpr.c -lm
gcc -g -Wall -o benchmark benchmark.c tinyexpr.c -lm

The make system will check the modification date of each source file and rebuild only what is needed:

touch example2.c # Simulate the fact that we modified only example2.c
make
gcc -g -Wall -o example2 example2.c tinyexpr.c -lm

Example Programs

The program example.c is a good example of usage of the library:

#include "tinyexpr.h"
#include <stdio.h>

int main(int argc, char *argv[]) {
    const char *c = "sqrt(5^2+7^2+11^2+(8-2)^2)";
    int err;

    double r = te_interp(c, &err);

    if(err) {
        printf("ERROR evaluating %s\n", c);
        return -1;
    }

    printf("The expression %s evaluates to: %f\n", c, r);
    return 0;
}

As previously mentioned, the program includes tinyexpr.h and calls te_interp to evaluate a mathematical expression c. See how the error code err is checked after the call to te_interp above. If all goes well the result is displayed by the program.

You can compile example with make or manually then run it as follows:

gcc example.c tinyexpr.c -o example -lm
./example
The expression sqrt(5^2+7^2+11^2+(8-2)^2) evaluates to: 15.198684

Here, -lm means that the program needs to be linked against the mathematical functions implementation of the libc (these will be called by tinyexpr).

Study also the second example program example2.c: it takes a mathematical expression from the command line and prints what it evaluates to:

gcc -g -Wall -o example2 example2.c tinyexpr.c -lm
./example2 "3*5"
Evaluating:
        3*5
result: 15.000000

Test Suite and Benchmark

The test suite program test-suite.c lets us validate the library's functionality. It is recommended that you use it regularly when compartmentalising, to check that you are not breaking functionality:

gcc -g -Wall -o test-suite test-suite.c tinyexpr.c -lm
./test-suite
# ...
ALL TESTS PASSED (200/200)

The benchmark benchmark.c can be used to measure the performance of the library, and compare it to the performance of native C mathematical operations:

gcc -g -Wall -o benchmark benchmark.c tinyexpr.c -lm
./benchmark 
Expression: sqrt(5^2+7^2+11^2+(8-2)^2)
Evaluated result: 15.1986841536
Native result: 15.1986841536
Total time: 0.143495 seconds
Evaluations per second: 696889
Native total time: 0.000143 seconds
Native evaluations per second: 701139351

As you can see the library is about 1000x slower than native C operations. This is expected, as interpreting mathematical expression takes a lot of time: it involves parsing the expression and determining what operations to run. On the other hand, native execution directly run the relevant operations.

Compartmentalising TimyExpr

Compartmentalisation Policy

Our goal is to compartmentalise every program making use of TinyExpr (example, example2, test-suite, and benchmark) and place the code and data of the TinyExpr library within its own compartment. We have seen that the interface exposed by our version of TinyExpr is made of a single function, te_expr, taking 2 pointers as parameters: a string pointer to the expression to evaluate, and an int pointer to a variable that will contain 0 on success of the evaluation, and something else on error. Upon success the function returns a double which is the result of the evaluation.

Our policy and the interaction between the main program and the sandbox can be illustrated as follows:

Cross-Compartment Interactions

Contrary to the first part of this lab, there is now the need for bidirectional cross-compartment communications:

The expression to evaluate, pointed to by the first parameter of te_expr, is allocated outside the sandbox, and needs to be transferred inside it.
Same thing for the integer pointed to by the second parameter of te_expr.
The return value of te_expr is allocated and initialised within the sandbox, and needs to be transferred to the caller's compartment.

Compartmentalising TinyExpr with IPC-based Communications

Design and implement a first version of the compartmentalised TinyExpr programs: example, example2, test-suite, and benchmark, that uses IPCs for cross-compartment communications.

As described previously the TinyExpr library should run within its own sandboxed process, and the rest of the program should run within another compartment. To that aim use the knowledge you gained in the first part of this lab. Still, there are a few differences here:

We don't want to spawn a new sandbox each time the main program request the evaluation of a mathematical expression, that would be too costly from the performance point of view. After its initialisation, have the sandbox compartment wait in a loop for calls to the exposed functionality. When a call is received, process it, return the result, and then wait for the next call.
Because we need to implement bidirectional communication between the two compartments, you should use pipes or socket to transfer the exposed function's parameters and return value between compartments.
The mathematical expressions to evaluate can have a highly variable size. With performance and memory footprint in mind, it is better to send through the IPC only the bytes needed vs. putting a static and potentially large cap on the expression size.
Think about the engineering effort it would take to transform all the programs using TinyExpr to make use of our compartmentalised version of the library. Ideally, the only things needed to have an arbitrary use your compartmentalised version would be:
1. To remove the includes to tinyexpr.h in that program sources, replace them with a custom header file you created;
2. To update the build rules (e.g. a Makefile) to compile an additional C file into the program. That file would contain code spawning the TinyExpr sandbox, initialising a communication channels, and handling the calls to te_expr by communicating with the sandbox.
Still with the goal of making things as transparent as possible, try to automate things as much as possible: ideally there will be no need to launch the sandbox binary and the main program's binary separately. The main program can spawn the sandbox e.g. with fork() + execve().

It is suggested to start with the simple programs, example and example2, then move on to the test suite and the benchmark. Once the test suite runs, make sure to execute it regularly to check that you are not introducing regressions.

Compartmentalising TinyExpr with Shared Memory-based Communications

Consider the performance results of the benchmark program with the IPC-based compartmentalised TinyExpr, and compare these numbers to the performance of the non-compartmentalised benchmark. You should see an important slowdown coming from the compartmentalisation, around 10x vs. non-compartmentalised expression evaluation.

There are several reasons coming from this slowdown, and in particular the number and latency of domain crossing, combined with the cost of sending/receiving data through the socket/pipe IPCs. Switching to another form of communication mechanism using shared memory more directly should allow to partially address this performance issue.

Create a second version of the compartmentalised TinyExpr, that this time uses an area of shared memory set up between the two compartments for communications, and integrate this version with all programs: example, example2, test-suite, and benchmark. Once again try to be as transparent as possible in the way your compartmentalised TinyExpr can be integrated in existing programs using vanilla TinyExpr: it should only be a matter of switching a header's name in the sources and slightly updating the build process. For this version of compartmentalised TinyExpr, which goal is to be as fast as possible, it is acceptable to put a cap on the maximum size a mathematical expression can have, e.g. a few kilobytes. Indeed, such a static memory allocation approach is much better for performance vs. dynamic allocation, where we would have to perform costly resize operations on the shared memory area when the length of the mathematical expressions to evaluate varies.

Establishing an Area of Shared Memory

Because the two compartments have separate binaries, the area of shared memory you need to create cannot be directly shared through address space duplication as we saw in the lectures. To open and map a shared memory area, use shm_open from both binaries, followed by mmap: see an example here. The compartment calling shm_open first (likely the main program) should use the O_CREAT flag to create it. The first argument of shm_open should be the same in both compartments. It should start with a / and not contain any other / character.

Cross-Compartment Synchronisation with Barriers

The compartments execute concurrently and access shared memory, so there is the need for synchronisation to avoid race conditions. For example, we want to avoid the TimyExpr compartment starting to read an expression to evaluate in shared memory before the main program has finished to write that expression in its entirety. This could be achieved with locks, as presented in the lecture, however in addition to enforcing the atomicity of accesses to the area of shared memory between the compartments, we also need to enforce the following ordering for these accesses:

The main compartment writes a request of a mathematical expression evaluation in shared memory;
Then the TinyExpr compartment reads that request and performs the evaluation;
The TinyExpr compartment writes the result in shared memory;
The main compartment reads the result

And rinse and repeat. It is possible to achieve that behaviour with locks, however there exists a mechanism that is much more suitable: barriers. Barriers let processes wait for each other at specified locations in their code:

So with our 2 compartments, we can use 2 barriers to obtain the desired behaviour:

See how to initialise and wait on a barrier here. Note that most of the guides regarding pthread_barrier_t will relate to threads, while here we want to have processes synchronise with the barriers. With processes things work very similarly, with the following differences:

The barrier objects need to be in shared memory to be accessible from both compartments.
To be visible from different processes, the barrier objects must be initialised with the PTHREAD_PROCESS_SHARED attribute.

Once your implementation is working, compare the performance as reported by the benchmark to 1) the non-compartmentalised version of TinyExpr and 2) the IPC-based compartmentalised version of the library. Using shared memory should be much faster vs. IPCs, but still significantly slower vs. no compartmentalisation.

Submission Instructions

Create one folder at the root of the GitLab submission repository for each of the compartmentalised versions of TinyExpr:

tinyexpr-ipc/
tinyexpr-shm/

Place every source file required to compile and run each version in each folder and add them to the version control system. You should make it as easy as possible for the TAs to build and run each program: example, example2, test-suite, and benchmark. Same thing for the understanding of you code: make sure to use clean code style, and comment anything that you believe require clarifications. If needed, write a small README file.

Once your work is ready to be marked make sure to tag.

Keyboard shortcuts

COMP60261 Lab 3: Software Compartmentalisation