class: center, middle ### Secure Computer Architecture and Systems *** # Brief Introduction to C ??? - Hi everyone - Welcome to this series of video that will briefly introduce the C programming language - This is a short introduction which hopefully will rather be a refresher for most of you - If you are not familiar at all with C it is difficult for me to give a comprehensive overview of the language in just a few videos, so make sure sure to check the recommended additional readings, I'll put in a list of books and other resources for a more in depth look at C --- # The C Programming Language - C is an old (designed in the 70s) but still very popular programming language - In the top 10 of most programming language popularity rankings - Tons of very popular software written in C ??? - The C programming language is a very old one. - It was designed in the 70s, but it's still widely popular. - Actually, it's one of the top 10 programming languages in most popularity rankings. -- .leftlargecol[
] ??? - And you have tons of popular software written in C. - You can see a few on the slide here. - Things like operating systems like Linux or macOS. - Web servers like Apache and Jennings. - Database systems like SQLite, hypervisors like Xen. - Some language runtimes like Perl, Python. - Other tools that you use daily like Git. - All of this is written in C. -- .rightsmallcol[ **Systems software**, low-level software interacting closely with the hardware: OSes, hypervisors, etc. ] ??? - Overall, what's written in C is system software. - Systems software represent relatively low-level software that generally interact very closely with the hardware. - Typical examples are operating systems and hypervisors. --- # The C Programming Language - **Pros:** - Low-level: close to the hardware, lots of freedom for the programmer to manipulate the machine - Fast, low memory footprint - Portable - Established a popular syntax ??? - There are many reasons why C is still a popular programming language. - First, it's low-level and let the programmer manipulates the hardware such as the CPU or memory quite directly - This is convenient when writing low level software such as operating systems. - Second, because of the simplicity of the language, programs written in C can be very fast and can have a very low memory footprint: this is crucial in domains such as high performance computing and embedded systems. - C has established a popular syntax, which is reused by many languages that came afterwards, such as Java, C++, and many others. - C is also portable: you will find compilers translating C code for each of the modern CPU architectures. -- - **Cons:** programmer freedom comes at a cost: a large area for making mistakes - Lack of **memory safety**, risks of undefined behaviour at runtime ??? - All these benefits come at a cost: C leaves a large window for the programmer to make mistakes, in particular when manipulating memory. - We say that C lacks memory safety, and the bugs that can be introduced this way can lead to some serious security issue, as we will see later in the unit -- Still **extensively used** in: systems software, high performance computing, embedded systems, etc. ??? - Despite these problems, C is still extensively used in many domains such as systems software, HPC, embedded systems, and so on. --- # Hello World in C ```c #include
// needed to use printf which is declared in stdio.h // main is the program entry point int main() { printf("hello, world!\n"); // print "hello, world!" on the console return 0; // returning from main exits the program } ``` .codelink[
`02-c-intro/hello.c`
] ??? - This is the traditional hello world written in C - The program starts by including `stdio.h`, which is part of the standard library and will give us access to a function to print text to the console - Then we have the definition of the `main` function. - In returns an integer (int), and does not take any parameter - In C main is the entry point, which means it contains the code that will run when the program starts - Within main we use the printf function to print hello world to the console - As you can see every statement in C ends with a semicolon. - Then return 0 from main - 0 is an integer code that as a convention means success in C - Returning from the main function will also exit the program -- C is a **compiled** language, need to translate source (text) into an executable program (binary): ```bash $ gcc hello.c -o hello $ ./hello hello, world! ``` ??? - C is a compiled language, which mean we first need to transform our source code, which is nothing more than a text file, into an executable program that can be run - We do so on the command line with GCC, the C compiler - It takes the source file as input, here `hello.c` - Then we have `-o` and the name of the executable we want to create, here it's `hello` - Once the compiler is done the program can be executed by simply typing `./hello` -- The compiler will perform some checks on the code and possibly emit **warnings** and **errors** ??? - Another important aspect of the compiler is that it will check for programming mistakes and will emit warnings and errors - These will be displayed on the console - Error are unrecoverable and will stop the compilation process, while warnings are not - Make sure to fix warnings and error in the order they are emitted by the compilers - Please also make sure that all the code you produce as part of this unit compiles without any warning or error --- # Variables - Variables have a **name**, a **type**, and a **value** - They must be **declared** before being used ```c #include
int main() { int a; // declare a of type int (signed integer) int b; int c; // b and c of type int int d = 12; // declare d of type int, and set its value int x, y = 10, z = 11; // declare x, y and z, set values for y and z a = 12; // set a's value to 12 b = 20; // set b's value to 20 c = 10 + 10; // set c's value to 20 a = b; // a = 20 d++; // d = d + 1 y *= 2; // y = y * 2; } ``` .codelink[
`02-c-intro/variables.c`
] ??? - Let's briefly talk about variables - In C like in many languages, variables have a name, a type, and a value - For example in the code on the slide the first variable declared is named `a` - Its type is `int`, which denotes a signed integer - And after the affectation its value is 12 - Each variable must be declared before being used - This is done as shown on the slide, with the type followed by the name of the variable - You can also see how to declare a variable and set its value in a signle statement, that is what happens to `d`, and how to declare in a signle statement several variables of the same type: that is what happens to x, y and z. - Next we manipulate these variables, affecting them some values - You can also see a bit of arithmetics, in particular pay atttention to the `d++` statement: it corresponds to incrementing `d` by one. - The next statement, `y *= 2` corresponds to multiplying y by 2. --- # Types - Help the compiler **check the validity of operations** on variables - Define **how much memory is allocated** for variables by the compiler ??? - In C types have 2 mains functions - First they help the compiler check the validity of the operations you apply on variables - Second, they define how much memory space should be allocated to store variables -- C has 3 basic types: integers, floating point numbers, and characters: ```c int my_integer = -12345; float my_float = 42.5; char my_char = 'a'; ```
??? - C has 3 basic types: integers, floating point numbers, and characters - You can see an example of each on the slide - An if we look inside the program's memory at runtime, we'll see that the compiler reserved a certaina moutn of byte for each variable - For example here with `my_integer`, which is an int (a signed integer), on an intel x86-64 CPU an int is stored on 32 bits, which is 4 bytes. --- # Qualifiers, Storage Size in Memory - Use qualifiers `long`/`short` to request higher/smaller storage sizes - Storage size for a given type depends on the architecture, use `sizeof` to get the exact size of a type on a given machine ```c int so_short = sizeof(short int); int so_int = sizeof(int); int so_uint = sizeof(unsigned int); int so_long = sizeof(long int); int so_longlong = sizeof(long long int); int so_float = sizeof(float); int so_double = sizeof(double); // storage sizes on x86-64: printf("size of short: %d bytes\n", so_short); // 2 bytes printf("size of int: %d bytes\n", so_int); // 4 bytes printf("size of unsigned int: %d bytes\n", so_uint); // 4 bytes printf("size of long int: %d bytes\n", so_long); // 8 bytes printf("size of long long int: %d bytes\n", so_longlong); // 8 bytes printf("size of float: %d bytes\n", so_float); // 4 bytes printf("size of double: %d bytes\n", so_double); // 8 bytes ``` .codelink[
`02-c-intro/sizeof.c`
] ??? - Types can be augmented with qualifiers to request more or less space - For example the type `short int` will be stored on 2 bytes, so it can store less signed integers than a traditional `int` - Conversely the type `long int` will be stored on 64 bits which is 8 bytes - `float` and `double` are used for floating point numbers, stored respectively on 4 and 8 bytes on x86-64 - Finally pay attention to the `unsigned` qualifier, which allows to indicate that a variable will only store positive integers --- # Printing to the Console: `printf` - `printf` parameters: - **Format string** to print as parameter - Optionally a list of variables which value should be printed - Variables' values replace markers (e.g. `%d`) in the format string ??? - As we have seen earlier the `printf` function allows to print text on the console, which is also called the standard output - It takes as first parameter what is called a format string, which contains the text to print - It then takes an additional 0 or more parameters which are variable names, referencing variables which values should be printed - These values will replace special markers located in the format string -- .leftcol[ - A few markers: - `%d` for signed integers, `%u` for unsigned - `%f` for floats - Prefix with `l` for `long`s and `double`s - `%c` for characters ] .rightcol[ ```c int i = -42; float f = 12.34; char c = 'a'; long unsigned int lui = 500; double d = 42.42; // prints "-42, 12.34, a, 500, 42.42": printf("%d, %f, %c, %lu, %lf\n", i, f, c, lui, d); ``` .codelink[
`02-c-intro/printf.c`
] ] ??? - Markers depend on the type of the variable one wants to print - For example we use `%d` for signed integers, `%f` for floats, or `%c` or characters - You can see a few examples on the slide - If you run this code, the program will display the value of each variable, separated with a coma --- # Arrays ```c int array[4]; // declare an array with 4 elements of type int array[0] = 42; // set the elements' content array[1] = 43; array[2] = 44; array[3] = 45; printf("%d\n", array[2]); // print the 3rd element of the array int arr2d[2][2]; // declare a 2-dimensional 2x2 array of ints arr2d[0][0] = 12; // set the elements' content arr2d[0][1] = 13; arr2d[1][0] = 14; arr2d[1][1] = 15; char str[3]; // in C, strings are array of characters... str[0] = 'h'; str[1] = 'i'; str[2] = '\0'; // ... that end with the `\0` termination character ``` .codelink[
`02-c-intro/arrays.c`
] ??? - Like most languages C supports arrays - You can see here how to declare a one dimensional integer array named `array`, and how to set each of its element to a certain value - Note that array indexes start at 0 in C - With the printf statement you can also see how to reference a particular array slot to print its value - C also support arrays with multiple dimensions: the array named `arr2d` is a two dimensional array - See how it is declared, with the size of each dimension indicated between brackets - It can then be indexed with 2 sets of brackets, one for each dimension - Although there is no string type per se in C, strings are represented as arrays of characters - You have an example here with `str`, which contains the string `hi` - Note that in C to be valid a string must have its last character be `\0` which is the termination character. - So make sure when you declare an array to have enough space for what you want to store plus the termination character. --
??? - An important thing about arrays in C is that they are laid out contiguously in memory - For each of the example arrays we have seen this is illustrated here - For example for array is an array of integers, so on x86-64 each of its members will have a size of 4 bytes - They are laid out in memory one after the other: array[0] first, then array[1], and so on. - Similarly, the integers of the 2-dimensional array are laid out contiguously, dimension by dimension - Finally, regarding the string, on every architecture the size of a character is one byte - So the array str looks as follows in memory: `h`, then `i`, then the termination character. --- # Conditionals, Functions ```c int num = 10; if (num > 0) { printf("Number is positive\n"); } else if (num < 0) { printf("Number is negative\n"); } else { printf("Number is zero\n"); } ``` .codelink[
`02-c-intro/conditional.c`
] ??? - In C you write conditionals like this, you have the `if` keyword, followed by the condition - If the condition is true, which in C translates into if the condition evaluates to something different than 0, the code coming next within brackets will run - If the condition is false (in other words if it evaluates to 0) you can add as many `else if` statements to evaluate additional conditions - The code within the brackets of the final `else` will run if none of the previous conditions evaluated to true -- ```c int add(int a, int b) { return a + b; } int main() { int result = add(2, 2); printf("2 + 2 = %d\n", result); } ``` ??? - Regarding functions, here you can see how we define a function named add - It takes two integers as parameters, `a` and `b`, and returns an integer which is in effect the sum of the two parameters - You can also see how it is called from the main function, which stores its return value in a variable and print it --- # Loops ```c int i, choice = 1; for (i = 0; i < 5; i++) { printf("For loop iteration %d\n", i); } i = 0; while (i < 5) { printf("While loop iteration %d\n", i); i++; } switch(choice) { case 1: printf("choice is 1\n"); break; case 2: printf("choice is 2\n"); break; default: printf("choice is neither 1 nor 2\n"); } ``` .codelink[
`02-c-intro/loop.c`
] ??? - In terms of loops, you have a few examples here - The `for` loop will start with `i` equals to `0`, will iterate until `i` is no longer inferior to `5`, and will increment `i` after each iteration with `i++` - The `while` loop does exactly the same thing, however you just specify a condition between parentheses, which when true will have the iterations continue - That is why `i` needs to be incremented manually within the loop's body - Finally, the last construct is a switch-case statement - It allows to execute code based on the value of what's in between parentheses after switch, here it's `choice` - If the value of choice is 1, the code after `case 1` will be executed - If it's 2 the code after `case 2` will be executed - And if it's anything else, the code after `default` will run - Make sure to end each of the pieces of code after case with a break to exit the body of the switch --- # Command Line Parameters Managed through `main`'s `argc` (number of parameters) and `argv` (parameters themselves): ```c int main(int argc, char **argv) { // 'char ** argv' means 'char argv[][]' printf("Number of command line arguments: %d\n", argc); for(int i = 0; i
`02-c-intro/cmdline.c`
] ??? - Regarding the command line parameters you can pass to your program, in C they are managed through the arguments of the main function - `argc` is an integer that indicate the number of command line parameters - Note that the first parameter is awlays the name of the program being executed, so that number will be at least 1 - `argv` is an array of strings that contain the value of the command line parameters - Here this example program iterates over all command line parameters and prints the value of each of them --- # Custom Types Alias types, e.g. to shorten them, with `typedef`: ```c typedef long long unsigned int my_int; // 'my_int' is now equivalent to 'long long unsigned int' int main(int argc, char **argv) { my_int x = 12; printf("x is: %llu\n", x); return 0; } ``` .codelink[
`02-c-intro/typedef.c`
] ??? - You can create custom types with the `typedef` keyword - For example here we can alias `long long usigned in` with `my_int` which is much shorter to write - Next we can use `my_int` anywhere we would have used `long long unsigned int` --- # Custom Data Structures .leftcol[ ```c struct person { char name[10]; float size_in_meters; int weight_in_grams; }; void print_person(struct person p) { printf("%s has a size of %f meters and " "weights %d grams\n", p.name, p.size_in_meters, p.weight_in_grams); } int main(int argc, char **argv) { struct person p1; p1.size_in_meters = 1.6; p1.weight_in_grams = 60000; strcpy(p1.name, "Julie"); struct person p2 = {"George", 1.8, 70000}; print_person(p1); print_person(p2); return 0; } ``` .codelink[
`02-c-intro/struct.c`
] ] ??? - You can create custom data structures by aggregating primitive types - This is done with the `struct` keyword - Here you have an example of a custom `struct person` which has 3 fields - A `name`, which is a string (an array of characters) - A `size_in_meters` which is a float - And a `weight_in_grams` which is an int - In main we can declare a variable `p1` which is of type `struct person` - Then we can set a value for each of the fields with the dot operator - We also use the string copy function from the standard library in order to set the name to Julie, which is much faster than setting each character one by one - Similarly we have another variable `p2`, this one declares and sets values for each of the data structure's fields with a one-liner. - Next we call twice the `print_person` function, which is defined above - This function prints the value of each field of a `struct person` - Look how it references each field with the dot operator -- .rightcol[ ```c typedef struct s_person { /* fields here ... */ }; typedef struct s_person person; void print_person(person p) { /* ... */} int main(int argc, char **argv) { person p1; person p2 = {"George", 1.8, 70000}; /* ... */ } ``` .codelink[
`02-c-intro/typedef-struct.c`
] ] ??? - Rather than typing `struct person` each time you want to reference your custom type, you can use a typedef - Here, after having defined my struct name `s_person`, we can alias it with typedef into `person`, which is much simpler to use - Next we just have to use the type `person` each time we want to refer to our custom data structure type --- # Custom Data Structures - The fields of an instance of a custom data structures are laid out in memory **contiguously** and **in order**: ```c struct person { char name[10]; float size_in_meters; int weight_in_grams; }; /* ... */ struct person p1 = /* ... */; ```
??? - One last thing about custom data structure - Their fields are laid out contiguously in memory, in the order defined in the struct delcaration - If we look at our previous example of the `struct person`, recall that it is made of a string, a float, and an int - If we declare a variable of type struct `s_person`, on x86-64 the compiler will lay that variable as follows in memory: - 10 bytes for the string `name`, 1 for each character - 4 bytes for the float `size_in_meters` - 4 bytes for the int `weight_in_grams` --- # Wrapping Up Very brief introduction to C: - Variables and types - Arrays/strings, printing to the console, command line parameters - Conditionals, functions and loops - Custom types and data structures ??? - And that's it! - We have had a very brief introduction to C and we saw: - Variables and types - Arrays - How to print to the console - Command line parameters - Conditionals, functions and loops - And finally custom types and data structures