COMP26020p1 - Modular Compilation

class: center, middle

### COMP26020 Programming Languages and Paradigms -- Part 1
***
# Modular Compilation

???

- Hello everyone
- In this video we will talk about modular compilation
- It's the process of decomposing your program sources into several files
- A compiling a single executable from these sources

---
# Modular Compilation

- Large programs: need to organise the code in several source files
  - [Redis's `src`
  folder](https://github.com/redis/redis/tree/7.0.0/src)
  - [Linux](https://github.com/torvalds/linux/tree/v6.4) kernel sources (56000 C source files for v6.4!)
- **How to break up the program code into different files representing 
  well isolated compilation units**?
  - Modular compilation, covered in this slide deck
--

- Also, can't run `gcc <56000 source files>` each time we do a small update to one or a
few source files
- **How to automate efficiently the build process?**
  - Automated compilation, covered in the next slide deck

???
- For large programs it is necessary to organise the code in several source
  files
  - See for example [Redis's `src`
  folder](https://github.com/redis/redis/tree/6.0.5/src)
  - [Linux](https://github.com/torvalds/linux/tree/v6.4) v6.4 is composed of more
  than 56K source files
- So you can't run `gcc <56K source files>`, which not only is impossible to write manually but also would take an enormous amount of time to compile each time we do a small update to one or a few source files
- So in this video we will answer the first question:
- **How to break up the program code into well isolated compilation units**?
- And in the next video we'll see:
- **How to automate efficiently the build process?**

---
class: inverse, middle, center

# A Closer Look at the Compilation Phase

???

- Let's study the compilation phase a little bit

---
name: phase

# A Closer Look at the Compilation Phase

???

- So until now we saw that when we compile a source file into an executable
- For example like that with gcc src.c -o prog
- The compiler directly transforms the source code into the executable
- But actually things are a bit more complicated under the hood

.center[Actually things are still a little bit more complicated 😛]

---
# A Closer Look at the Compilation Phase

???

- The compiler produces (sometimes transparently) an intermediate format in the
  form of **object files**
- And then we have a second tool named the **linker**, called under the hood by the compiler, that  transforms object files into executables
- We can trigger these steps manually with the following commands
- First we generate the object files with gcc dash c the source dash o the object file
- Then we link into an executable with gcc the object file dash o the executable
- with only 1 file, the linker does little work, but it plays an important role with modular compilation

---
# A Closer Look at the Compilation Phase

???

- One last thing: same as the executable, the object files are also in binary format

---
class: inverse, middle, center
# Breaking Down the Program into Modules

???
- Now let's see how to break down the source code of a program into several files, also called modules

---
name: breaking
# Breaking Down the Program into Modules

- Hypothetical example of a server application implemented into 3 `.c` source files,
  also named **modules**:

---
template: breaking

---
template: breaking

```bash
gcc -c main.c -o main.o                   # build main.o
gcc -c parser.c -o parser.o               # build parser.o
gcc -c network.c -o network.o             # build network.o
gcc main.o network.o parser.o -o prog     # link prog
```

???
- It is good to regroup code into files (modules) by functionality
- Let's assume that we have an hypothetical server application built from 3 source files
- network.c regroup the networking code
- parser.c has the code to parse the requests
- and main.c contains the entry point, initialisation code, and main server loop
- We can compile each into the corresponding object file as we saw previously
- And then we link all the object files together into an executable as follows

---
name: interractions
# Breaking Down the Program into Modules

- Each module (`.c` source file) exposes an **interface** callable
  from the other source files
  - Should be **as small as possible** to hide internal code/data

---
template: interractions

.leftlargecol[
- Assume the following interactions:
  - Srever starts, `main.c` calls `init_network()` implemented in `network.c` to init. networking
  - `main.c` then runs the main sever loop:
      - Calls `rcv_request()` implemented in `network.c` to wait for the next request
      - When received, calls `parse_req()` implemented by `parser.c` to process the request
      - Rince and repeat
]

.rightsmallcal[
  <div style="text-align:center"><img src="include/interractions.svg" width=210 /></div>
]

???
- Each source file exposes a set of functions that can be called from other
  files (interface)
- This interface should be as small as possible to hide internal module code and data
- For example if we have a function implemented in network.c and this function is only supposed to be called from network there is not reason to give `main.c` the *possibility* to
    call a function implemented in `network.c` that is supposed to be called only
    from there (internal function)
- We use header files (`*.h`) and the `#include` preprocessor directives to
  realise this compartmentalization
- Let's assume that the external interface offered by the network module is constituted of 2 functions, init_network and receive request
- And the interface offered by the parser module is a single function, parse request
- The main module is supposed to call all these functions

---
# Header File for Modular Compilation

- There is generally 1 header file per module, **defining its interface**.
- Can be included in several `.c` files so **must contain only declarations**
  - Function prototypes, struct/typedefs/enums declarations, variable declarations
  - **No definitions** (i.e. function body/global variable initialisation)
- For example `network.h`:

.leftcol[
```c
#ifndef NETWORK_H // include guard
#define NETWORK_H

typedef struct {
    int id;
    char content[128];
} request;

void init_network();
int rcv_request(request *r);

#endif /* NETWORK_H */
//
```
.codelink[<a href="src/network.h" download>`16-modular-compilation/network.h`</a> <a href="https://github.com/olivierpierre/comp26020-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em" ></a>]
]

.rightcol[
<div style="text-align:center"><img src="include/modular3.svg" width=175 /></div>
]

???

- So we create generally one header file per module exporting an interface
- We have the architecture of our program on the left, we'll have 1 header for network and 1 header for parser
- In addition to the 3 source files, network.c, parser.c and main.c
- These header files will be included in several source files so it is important that they contain only declarations
- In other words, function prototypes, struct/typedefs/enums declarations, variable declarations
- They should contain no definitions
- We see for example `network.h` on the slide
- It contains everything that the world outside the module network needs to know in order to use the network interface
- So, the prototypes of the two function of the interface
- But also the declaration of the struct that is used as parameter in 1 of these functions
- Note the enclosing #ifdef NETWORK_H, it's call an include guard
- It avoid the problem of double declaration when we include in a C file several files that themselves include this header

---
# Breaking Down the Program into Modules

- `network.c`:

.leftcol[
```c
/* std includes here */
*#include "network.h"

// this function and variable are internal
// so they are not declared in network.h
// the keyword static force their use
// to be only within the network.c file
static void generate_request(request *r);
static int request_counter = 0;

void init_network() {
    /* init code here ... */
}

int rcv_request(request *r) {
    generate_request(r);
    /* ... */
}

static void generate_request(request *r) {
    /* ... */
}
```
.codelink[<a href="src/network.c" download>`16-modular-compilation/network.c`</a> <a href="https://github.com/olivierpierre/comp26020-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em" ></a>]
]

.rightcol[
<div style="text-align:center"><img src="include/modular3.svg" width=250 /></div>
]

???

- Now if we look at the implementation of the module network, that is the content of network.c, it looks like that
- We need to include the corresponding header to get access to the struct definition
- We have a function and a global variable that are internal to the module and they are not supposed to be called from outside, we can enforce that with the static keyword
- And we have the implementation of the 2 functions that are exported by the network module

---
# Breaking Down the Program into Modules

.leftcol[
- `parser.h`:

```c
#ifndef PARSER_H
#define PARSER_H

/* needed for the definition of request: */
#include "network.h"

void parse_req(request *r);

#endif
```
.codelink[<a href="src/parser.h" download>`16-modular-compilation/parser.h`</a> <a href="https://github.com/olivierpierre/comp26020-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em" ></a>]

<div style="text-align:center"><img src="include/modular3.svg" width=150 /></div>
]

.rightcol[
- `parser.c`:

```c
#include <stdio.h>

*#include "parser.h"

static void internal1(request *r);
static void internal2(request *r);

void parse_req(request *r) {
    internal1(r);
    internal2(r);
}

static void internal1(request *r) {
    /* ... */
}

static void internal2(request *r) {
    /* ... */
}
```
.codelink[<a href="src/parser.c" download>`16-modular-compilation/parser.c`</a> <a href="https://github.com/olivierpierre/comp26020-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em" ></a>]
]

???

- If we look at parser.h and parser.c, it follows the same principles as for the the network module
- We have an external interface declared in the header
- We also need the declaration of the struct so we include network.h in the parser header
- And in the module implementation we include the parser header
- We have a couple of internal functions, as well as the exported function

---
# Breaking Down the Program into Modules

.leftcol[
- `main.c`:

```c
*#include "network.h"
*#include "parser.h"

int main(int argc, char **argv) {
    request req;

/* call functions from network module */
    init_network();
    rcv_request(&req);

/* call function from parser module */
    parse_req(&req);

/* ... */
}
```
.codelink[<a href="src/main.c" download>`16-modular-compilation/main.c`</a> <a href="https://github.com/olivierpierre/comp26020-devcontainer" target="_blank" style="text-decoration: none"><img src="include/gh-logo.svg" style="height: 1em" ></a>]
]

.rightcol[
<div style="text-align:center"><img src="include/modular3.svg" width=250 /></div>
]

???

- Finally, main.c looks like that
- It's including both header to get access to the interface function prototypes
- And within the main function the functions from the interface can be called
- We can compile our program like this

---
# Compile & Test

```sh
# Compile the .c source files, in no particular order:
$ gcc -c main.c -o main.o
$ gcc -c parser.c -o parser.o
$ gcc -c network.c -o network.o

# Link the final executable:
$ gcc main.o parser.o network.o -o prog

# Launch it
$ ./prog
```

---
# Summary

- **Modular Compilation:** breaking down a program's sources into multiple
header `.h` files and source `.c` files
  - Proper source organisation is important for medium/large programs 
  - Export using header files only the interface supposed to be used from outside the
    module

----

.center[Feedback form: https://bit.ly/3lInZ8h]
<div style="text-align:center"><img src="include/qr-code.png" height=150 /></div>

???
- Let's recap
- Breaking down C/C++ code into several files
- It is done with a combination of header files, containing only definitions and included where needed
- and source files, containing implementation, in other words definitions
- In the next video, we will see how to automate the compilation of program made of multiple sources