Runtime Defences
You can access the slides 🖼️ for this lecture. All the code samples given here can be found online, alongside instructions on how to bring up the proper environment to build and execute them here.
We have previously covered good coding practices to avoid introducing bugs, and analysis techniques to detect bugs in existing code bases. Here we discuss defences running at runtime in production.
Non-Executable Memory
Back in the days a large part of the address space use to be accessible with execution right. That included the stack, which was really awful from the security point of view: indeed it meant that an attacker could write malicious machine code in there, for example through an overflow on the stack, and then have the CPU jumps to it. That is called a code injection attack.
In the early 2000s hardware support appeared for setting part of the address space as non-executable. It was used to set everything that should not be accessed i executable mode as non-executable: the stack, heap, static data sections, and so on:
Such non-executable memory made code injection attacks much less likely in the corresponding areas. Today modern systems software aim to enforce the write XOR execute principle for each memory area. That principle states that you cannot have an area of memory be both writable and executable at the same time.
Address Space Layout Randomisation (ASLR)
Another defence that is present in almost every system today is address space layout randomisation. With ASLR each invocation of the program will have a different layout for the address space, in other words code and variable won't be at the same location in memory for subsequent invocations:
The goal is to make it harder for the attacker to determine where is what in the address space Recall from the attacks we have seen that many of them require us to know exactly where a buffer to overflow is present or the stack, or exactly where we need to jump in the code segment. This is achieved by observing one invocation of the program, for example with a debugger. And then starting the attack upon a second invocation of the program. With ASLR that won't work anymore because the locations of the data and code we determine with the first invocation are not the same for the second.
ASLR Granularity
Please note that the granularity of ASLR is coarse-grained: for performance reasons we cannot really randomise the location of each variable independently. So randomisation is realised at the level of large areas of the virtual address space called program's segments, as illustrated above. It is realised at load time for the main program, and for dynamic libraries it's also realised when they are loaded.
To illustrate the granularity of ASLR on Linux and understand the relevant security implications, consider the following program:
int global1 = 42;
int global2 = 43;
int main() {
int local1 = 24;
int local2 = 25;
int *heap_ptr1 = malloc(sizeof(int));
int *heap_ptr2 = malloc(sizeof(int));
printf("data addr 1: %p\n", &global1);
printf("data addr 2: %p\n", &global2);
printf("stack addr 1: %p\n", &local1);
printf("stack addr 2: %p\n", &local2);
printf("heap addr 1: %p\n", heap_ptr1);
printf("heap addr 2: %p\n", heap_ptr2);
free(heap_ptr1);
free(heap_ptr2);
}
This program simply prints the addresses of two local variables, 2 global variables, and 2 heap pointers. An example of execution will output the following:
$ ./aslr # First execution
data addr 1: 0x55aa68336028
data addr 2: 0x55aa6833602c
stack addr 1: 0x7ffdf6d8bbac
stack addr 2: 0x7ffdf6d8bba8
heap addr 1: 0x55aaa113a2a0
heap addr 2: 0x55aaa113a2c0
$ ./aslr # Second execution
data addr 1: 0x562632aec028
data addr 2: 0x562632aec02c
stack addr 1: 0x7ffd2ffe4fbc
stack addr 2: 0x7ffd2ffe4fb8
heap addr 1: 0x56264178d2a0
heap addr 2: 0x56264178d2c0
As you can see the relative distance between 2 variables located in a different segment is randomised across different executions of the same program. However, the relative distance between 2 variables belonging to the same segment stays the same across executions. That is before only the base address of each segment is randomised at load time. What that means is that if the attacker can leak the value of a single pointer, it is easy for them to compute the address of all other data or code within the containing segment. Hence, the coarse grained nature of ASLR on modern system makes it easy to break.
Stack Canaries
The stack canary is a technique that aims to protect the return address on the stack. The key idea is to place a magic value named the canary right before the return address in a callee's stack frame. And to compare the canary's value to a ground truth when the callee's return:
More precisely, canaries work as follows. First, at build time the compiler generates code to push the canary on the stack upon certain function calls. Here the canary's value is 0x1234:
When the callee returns, the canary's value is checked to make sure it is still 0x1234:
The key idea is that an overflow aiming at rewriting the return address would also overwrite the canary:
The check of an overwritten canary value against the ground truth will fail, which in effect detects the attack and stops the program.
By default, canaries will be applied only to certain functions.
You can use the -fstrack-protector-strong to apply it to a larger subset of your program's functions, and -fstack-protector-all to apply it to every function in your program.
More canaries will increase the security of your program, but also will increase performance and code size overheads.
Stack canaries are not a perfect protection. With current implementations, the same canary value is used to protect all function calls. This means that if the attacker can leak the canary value, for example if there is a an overflow in read mode on the stack, then the protection is broken for the entire program.
Other Common Hardening Techniques
Stripping Binaries
Other common protection techniques include stripping your program from symbols and debug information:
strip <binary>
This makes reverse-engineering a program available in binary form only, a crucial step in many attacks, much more difficult.
Read-Only Relocations (RELRO)
Read only relocations is a protection technique against attacks aiming at overwriting the relocation, which is the method used to resolve calls to shared libraries at runtime. You can see these as a form of function pointers. Partial and full read-only relocation sets part or all of the corresponding area of the address space read only. Once again here you have a trade-off to chose between security and performance overhead, as full RELRO will make load time much longer.
Partial RELRO is enabled by default with modern compilers. To enable full RELRO, add these compiler flags:
-Wl,-z,relro,-z,now
_FORTIFY_SOURCE Macro
The FORTIFY_SOURCE macro enables some buffer overflow protection checks before sensitive functions such as the string manipulation ones and memcpy.
There are 2 levels for it, the second one adding more checks but coming at the risk of breaking your program, so only use it if you can fully test that things run OK.
Add the following flags to the compiler to enable it:
--D_FORTIFY_SOURCE=1(level 1)--D_FORTIFY_SOURCE=2(level 2)
checksec
You can analyse a binary to check for the presence or absence of all the hardening techniques we covered with the checksec tool.
Here is an example of usage:
$ gcc -fstack-protector-strong -D_FORTIFY_SOURCE=2 -O2 -Wl,-z,relro -Wl,-z,now myapp.c -o myapp
$ strip myapp
$ checksec --file myapp
RELRO STACK CANARY NX PIE Symbols FORTIFY
Full RELRO Canary found NX enabled PIE enabled No Symbols Yes
We have covered all the protections here. Regarding PIE, it means position independent executable, and it is a prerequisite for the binary to be compatible with ASLR.
Control Flow Integrity (CFI)
Let's talk a bit more in details about a last, more advanced, technique, named control flow integrity. We have seen previously how control flow hijacking attacks force the program to take code paths that were not intended by the programmer. CFI enforce that the code paths executed by the program conform to the control flow graph originally intended by the programmer. CFI generally involves two protections. Forward edge CFI, checking that function pointers and C++ virtual table always have a legitimate target. And backward edge CFI, checking that return addresses also always have a legitimate target.
Forward Edge CFI
Regarding forward edge protection, CFI enforce that when a function pointer or an entry in a C++ virtual table is called, the target should be a valid function. What valid means depends on the implementation. With coarse grain CFI, the protection will just check that the target is the beginning of a function. With fine grain CFI, the protection will make sure that only the functions which addresses are assigned to the function pointer or virtual table in the code can be called.
The Clang/LLVM compiler has a software implementation of CFI, to enable it use the following compiler flags:
clang -g -fsanitize=cfi -flto -fvisibility=hidden program.c -o program
This will instruct the compiler to insert the necessary instrumentation for CFI checks.
Recent Intel processors also have CFI in hardware, implemented as a technology called Control Flow Enforcement: there is a special instruction endbranch64 that indicate valid targets for function pointers of virtual table member invocation.
Backward Edge CFI
Regarding backward edge protection, which protect the return address on the stack, this is achieved for CFI with what is called a shadow stack. It's a separate stack that is that stores a copy of the return address upon function call. When that callee returns, the return address to jump to is checked against the copy in the shadow stack: if they don't match, something fishy is going on.
Let's unroll an example to understand how the shadow stack works.
Assume we are running the code of a function f1, we have its frame on the stack, and the shadow stack which is empty for now:
When f1 calls f2, it pushes the return address on the stack normally, but also a copy of it on the shadow stack:
When f2 calls f3, the process is repeated, and same thing when f3 calls f4:
When f4 returns in f3, the return address on the stack is checked against the corresponding entry in the shadow stack.
If they match it's all good.
If there was an overflow, they would be different.
f3 then returns in f2, and we get another check:
And f2 returns in f1, another check:
Of course the shadow stack needs to be placed by the compiler at a location in memory that is very hard for an attacker to read or write.
To conclude, we have seen defences that can be applied at runtime and that are commonly used in production to detect exploits, make them much harder to achieve, and limit the damage that an attacker can do with exploits that succeed. Because these things run in production, their performance overhead needs to be low, generally just a few percents.