Compartmentalising TinyExpr

Compartmentalisation Policy

Our goal is to compartmentalise every program making use of TinyExpr (example, example2, test-suite, and benchmark) and place the code and data of the TinyExpr library within its own compartment. We have seen that the interface exposed by our version of TinyExpr is made of a single function, te_interp, taking 2 pointers as parameters: a string pointer to the expression to evaluate, and an int pointer to a variable that will contain 0 on success of the evaluation, and something else on error. Upon success the function returns a double which is the result of the evaluation.

Our policy and the interaction between the main program and the sandbox can be illustrated as follows:

Cross-Compartment Interactions

Contrary to the first part of this lab, there is now the need for bidirectional cross-compartment communications:

The expression to evaluate, pointed to by the first parameter of te_interp, is allocated outside the sandbox, and needs to be transferred inside it.
Same thing for the integer pointed to by the second parameter of te_interp.
The return value of te_interp is allocated and initialised within the sandbox, and needs to be transferred to the caller's compartment.

Compartmentalising TinyExpr with IPC-based Communications

Design and implement a first version of the compartmentalised TinyExpr programs: example, example2, test-suite, and benchmark, that uses IPCs for cross-compartment communications.

As described previously the TinyExpr library should run within its own sandboxed process, and the rest of the program should run within another compartment. To that aim use the knowledge you gained in the first part of this lab. Still, there are a few differences here:

We don't want to spawn a new sandbox each time the main program request the evaluation of a mathematical expression, that would be too costly from the performance point of view. After its initialisation, have the sandbox compartment wait in a loop for calls to the exposed functionality. When a call is received, process it, return the result, and then wait for the next call.
Because we need to implement bidirectional communication between the two compartments, you should use pipes or socket to transfer the exposed function's parameters and return value between compartments.
The mathematical expressions to evaluate can have a highly variable size. With performance and memory footprint in mind, it is better to send through the IPC only the bytes needed vs. putting a static and potentially large cap on the expression size.
Think about the engineering effort it would take to transform all the programs using TinyExpr to make use of our compartmentalised version of the library. Ideally, the only things needed to have an arbitrary use your compartmentalised version would be:
1. To remove the includes to tinyexpr.h in that program sources, replace them with a custom header file you created;
2. To update the build rules (e.g. a Makefile) to compile an additional C file into the program. That file would contain code spawning the TinyExpr sandbox, initialising a communication channels, and handling the calls to te_interp by communicating with the sandbox.
Still with the goal of making things as transparent as possible, try to automate things as much as possible: ideally there will be no need to launch the sandbox binary and the main program's binary separately. The main program can spawn the sandbox e.g. with fork() + execve().

It is suggested to start with the simple programs, example and example2, then move on to the test suite and the benchmark. Once the test suite runs, make sure to execute it regularly to check that you are not introducing regressions.

Compartmentalising TinyExpr with Shared Memory-based Communications

Consider the performance results of the benchmark program with the IPC-based compartmentalised TinyExpr, and compare these numbers to the performance of the non-compartmentalised benchmark. You should see an important slowdown coming from the compartmentalisation, around 10x vs. non-compartmentalised expression evaluation.

There are several reasons coming from this slowdown, and in particular the number and latency of domain crossing, combined with the cost of sending/receiving data through the socket/pipe IPCs. Switching to another form of communication mechanism using shared memory more directly should allow to partially address this performance issue.

Create a second version of the compartmentalised TinyExpr, that this time uses an area of shared memory set up between the two compartments for communications, and integrate this version with all programs: example, example2, test-suite, and benchmark. Once again try to be as transparent as possible in the way your compartmentalised TinyExpr can be integrated in existing programs using vanilla TinyExpr: it should only be a matter of switching a header's name in the sources and slightly updating the build process. For this version of compartmentalised TinyExpr, which goal is to be as fast as possible, it is acceptable to put a cap on the maximum size a mathematical expression can have, e.g. a few kilobytes. Indeed, such a static memory allocation approach is much better for performance vs. dynamic allocation, where we would have to perform costly resize operations on the shared memory area when the length of the mathematical expressions to evaluate varies.

Establishing an Area of Shared Memory

Because the two compartments have separate binaries, the area of shared memory you need to create cannot be directly shared through address space duplication as we saw in the lectures. To open and map a shared memory area, use shm_open from both binaries, followed by mmap: see an example here. The compartment calling shm_open first (likely the main program) should use the O_CREAT flag to create it. The first argument of shm_open should be the same in both compartments. It should start with a / and not contain any other / character.

Cross-Compartment Synchronisation with Barriers

The compartments execute concurrently and access shared memory, so there is the need for synchronisation to avoid race conditions. For example, we want to avoid the TinyExpr compartment starting to read an expression to evaluate in shared memory before the main program has finished writing that expression in its entirety. This could be achieved with locks, as presented in the lecture, however in addition to enforcing the atomicity of accesses to the area of shared memory between the compartments, we also need to enforce the following ordering for these accesses:

The main compartment writes a request of a mathematical expression evaluation in shared memory;
Then the TinyExpr compartment reads that request and performs the evaluation;
The TinyExpr compartment writes the result in shared memory;
The main compartment reads the result

And rinse and repeat. It is possible to achieve that behaviour with locks, however there exists a mechanism that is much more suitable: barriers. Barriers let processes wait for each other at specified locations in their code:

So with our 2 compartments, we can use 2 barriers to obtain the desired behaviour:

See how to initialise and wait on a barrier here. Note that most of the guides regarding pthread_barrier_t will relate to threads, while here we want to have processes synchronise with the barriers. With processes things work very similarly, with the following differences:

The barrier objects need to be in shared memory to be accessible from both compartments.
To be visible from different processes, the barrier objects must be initialised with the PTHREAD_PROCESS_SHARED attribute.

Once your implementation is working, compare the performance as reported by the benchmark to 1) the non-compartmentalised version of TinyExpr and 2) the IPC-based compartmentalised version of the library. Using shared memory should be much faster vs. IPCs, but still significantly slower vs. no compartmentalisation.

Submission Instructions

Create one folder at the root of the GitLab submission repository for each of the compartmentalised versions of TinyExpr:

tinyexpr-ipc/
tinyexpr-shm/

Place every source file required to compile and run each version in each folder and add them to the version control system. You should make it as easy as possible for the TAs to build and run each program: example, example2, test-suite, and benchmark. Same thing for the understanding of you code: make sure to use clean code style, and comment anything that you believe require clarifications. If needed, write a small README file.

Once your work is ready to be marked make sure to tag.

Concluding Remarks

Optionally, you can take a look at the full version of TinyExpr here: it contains more functionalities vs. the simplified version we just compartmentalised. In particular, TinyExpr lets the user declare variables and functions in the expressions to evaluate. This involves the exchange of more complex data structures, with several levels of pointers, between the main program's code and the TinyExpr library. Part of these data structures can also be modified by the main program anytime during the program's execution, e.g. to update the value of a variable.

Imagine how much more complex the compartmentalisation of TinyExpr would become if we had to support all these features...

Keyboard shortcuts

COMP60261 Lab 3: Software Compartmentalisation