class: center, middle ### COMP26020 Programming Languages and Paradigms -- Part 1 *** # Modular Compilation ??? - Hello everyone - In this video we will talk about modular compilation - It's the process of decomposing your program sources into several files - A compiling a single executable from these sources --- # Modular Compilation - Large programs: need to organise the code in several source files - [Redis's `src` folder](https://github.com/redis/redis/tree/7.0.0/src) - [Linux](https://github.com/torvalds/linux/tree/v6.4) kernel sources (56000 C source files for v6.4!) - **How to break up the program code into different files representing well isolated compilation units**? - Modular compilation, covered in this slide deck -- - Also, can't run `gcc <56000 source files>` each time we do a small update to one or a few source files - **How to automate efficiently the build process?** - Automated compilation, covered in the next slide deck ??? - For large programs it is necessary to organise the code in several source files - See for example [Redis's `src` folder](https://github.com/redis/redis/tree/6.0.5/src) - [Linux](https://github.com/torvalds/linux/tree/v6.4) v6.4 is composed of more than 56K source files - So you can't run `gcc <56K source files>`, which not only is impossible to write manually but also would take an enormous amount of time to compile each time we do a small update to one or a few source files - So in this video we will answer the first question: - **How to break up the program code into well isolated compilation units**? - And in the next video we'll see: - **How to automate efficiently the build process?** --- class: inverse, middle, center # A Closer Look at the Compilation Phase ??? - Let's study the compilation phase a little bit --- name: phase # A Closer Look at the Compilation Phase
??? - So until now we saw that when we compile a source file into an executable - For example like that with gcc src.c -o prog - The compiler directly transforms the source code into the executable - But actually things are a bit more complicated under the hood -- .center[Actually things are still a little bit more complicated 😛] --- # A Closer Look at the Compilation Phase
??? - The compiler produces (sometimes transparently) an intermediate format in the form of **object files** - And then we have a second tool named the **linker**, called under the hood by the compiler, that transforms object files into executables - We can trigger these steps manually with the following commands - First we generate the object files with gcc dash c the source dash o the object file - Then we link into an executable with gcc the object file dash o the executable - with only 1 file, the linker does little work, but it plays an important role with modular compilation --- # A Closer Look at the Compilation Phase
??? - One last thing: same as the executable, the object files are also in binary format --- class: inverse, middle, center # Breaking Down the Program into Modules ??? - Now let's see how to break down the source code of a program into several files, also called modules --- name: breaking # Breaking Down the Program into Modules - Hypothetical example of a server application implemented into 3 `.c` source files, also named **modules**: --- template: breaking
--- template: breaking
```bash gcc -c main.c -o main.o # build main.o gcc -c parser.c -o parser.o # build parser.o gcc -c network.c -o network.o # build network.o gcc main.o network.o parser.o -o prog # link prog ``` ??? - It is good to regroup code into files (modules) by functionality - Let's assume that we have an hypothetical server application built from 3 source files - network.c regroup the networking code - parser.c has the code to parse the requests - and main.c contains the entry point, initialisation code, and main server loop - We can compile each into the corresponding object file as we saw previously - And then we link all the object files together into an executable as follows --- name: interractions # Breaking Down the Program into Modules - Each module (`.c` source file) exposes an **interface** callable from the other source files - Should be **as small as possible** to hide internal code/data --- template: interractions .leftlargecol[ - Assume the following interactions: - Srever starts, `main.c` calls `init_network()` implemented in `network.c` to init. networking - `main.c` then runs the main sever loop: - Calls `rcv_request()` implemented in `network.c` to wait for the next request - When received, calls `parse_req()` implemented by `parser.c` to process the request - Rince and repeat ] .rightsmallcal[
] ??? - Each source file exposes a set of functions that can be called from other files (interface) - This interface should be as small as possible to hide internal module code and data - For example if we have a function implemented in network.c and this function is only supposed to be called from network there is not reason to give `main.c` the *possibility* to call a function implemented in `network.c` that is supposed to be called only from there (internal function) - We use header files (`*.h`) and the `#include` preprocessor directives to realise this compartmentalization - Let's assume that the external interface offered by the network module is constituted of 2 functions, init_network and receive request - And the interface offered by the parser module is a single function, parse request - The main module is supposed to call all these functions --- # Header File for Modular Compilation - There is generally 1 header file per module, **defining its interface**. - Can be included in several `.c` files so **must contain only declarations** - Function prototypes, struct/typedefs/enums declarations, variable declarations - **No definitions** (i.e. function body/global variable initialisation) - For example `network.h`: .leftcol[ ```c #ifndef NETWORK_H // include guard #define NETWORK_H typedef struct { int id; char content[128]; } request; void init_network(); int rcv_request(request *r); #endif /* NETWORK_H */ // ``` .codelink[
`16-modular-compilation/network.h`
] ] .rightcol[
] ??? - So we create generally one header file per module exporting an interface - We have the architecture of our program on the left, we'll have 1 header for network and 1 header for parser - In addition to the 3 source files, network.c, parser.c and main.c - These header files will be included in several source files so it is important that they contain only declarations - In other words, function prototypes, struct/typedefs/enums declarations, variable declarations - They should contain no definitions - We see for example `network.h` on the slide - It contains everything that the world outside the module network needs to know in order to use the network interface - So, the prototypes of the two function of the interface - But also the declaration of the struct that is used as parameter in 1 of these functions - Note the enclosing #ifdef NETWORK_H, it's call an include guard - It avoid the problem of double declaration when we include in a C file several files that themselves include this header --- # Breaking Down the Program into Modules - `network.c`: .leftcol[ ```c /* std includes here */ *#include "network.h" // this function and variable are internal // so they are not declared in network.h // the keyword static force their use // to be only within the network.c file static void generate_request(request *r); static int request_counter = 0; void init_network() { /* init code here ... */ } int rcv_request(request *r) { generate_request(r); /* ... */ } static void generate_request(request *r) { /* ... */ } ``` .codelink[
`16-modular-compilation/network.c`
] ] .rightcol[
] ??? - Now if we look at the implementation of the module network, that is the content of network.c, it looks like that - We need to include the corresponding header to get access to the struct definition - We have a function and a global variable that are internal to the module and they are not supposed to be called from outside, we can enforce that with the static keyword - And we have the implementation of the 2 functions that are exported by the network module --- # Breaking Down the Program into Modules .leftcol[ - `parser.h`: ```c #ifndef PARSER_H #define PARSER_H /* needed for the definition of request: */ #include "network.h" void parse_req(request *r); #endif ``` .codelink[
`16-modular-compilation/parser.h`
]
] .rightcol[ - `parser.c`: ```c #include
*#include "parser.h" static void internal1(request *r); static void internal2(request *r); void parse_req(request *r) { internal1(r); internal2(r); } static void internal1(request *r) { /* ... */ } static void internal2(request *r) { /* ... */ } ``` .codelink[
`16-modular-compilation/parser.c`
] ] ??? - If we look at parser.h and parser.c, it follows the same principles as for the the network module - We have an external interface declared in the header - We also need the declaration of the struct so we include network.h in the parser header - And in the module implementation we include the parser header - We have a couple of internal functions, as well as the exported function --- # Breaking Down the Program into Modules .leftcol[ - `main.c`: ```c *#include "network.h" *#include "parser.h" int main(int argc, char **argv) { request req; /* call functions from network module */ init_network(); rcv_request(&req); /* call function from parser module */ parse_req(&req); /* ... */ } ``` .codelink[
`16-modular-compilation/main.c`
] ] .rightcol[
] ??? - Finally, main.c looks like that - It's including both header to get access to the interface function prototypes - And within the main function the functions from the interface can be called - We can compile our program like this --- # Compile & Test ```sh # Compile the .c source files, in no particular order: $ gcc -c main.c -o main.o $ gcc -c parser.c -o parser.o $ gcc -c network.c -o network.o # Link the final executable: $ gcc main.o parser.o network.o -o prog # Launch it $ ./prog ``` --- # Summary - **Modular Compilation:** breaking down a program's sources into multiple header `.h` files and source `.c` files - Proper source organisation is important for medium/large programs - Export using header files only the interface supposed to be used from outside the module ---- .center[Feedback form: https://bit.ly/3lInZ8h]
??? - Let's recap - Breaking down C/C++ code into several files - It is done with a combination of header files, containing only definitions and included where needed - and source files, containing implementation, in other words definitions - In the next video, we will see how to automate the compilation of program made of multiple sources