class: center, middle ### COMP26020 Programming Languages and Paradigms -- Part 1 *** # Pointers Applications ??? - Hello everyone - In the previous video we have presented pointers, and in this one we will describe in what scenarios they can be useful --- # Allow a Function to Access the Calling Context - Want a function to 'change its parameters'? - Arguments are passed by copy in C! ```c void add_one(int param) { param++; } int main(int argc, char **argv) { int x = 20; printf("before call, x is %d\n", x); // prints 20 add_one(x); printf("after call, x is %d\n", x); // prints 20 return 0; } ``` .codelink[
`10-pointers-applications/params-copy.c`
] --
--- name: params # Allow a Function to Access the Calling Context - Want a function to 'change its parameters'? - Use a pointer argument to pass the address as parameter ```c *void add_one(int *param) { * (*param)++; } int main(int argc, char **argv) { int x = 20; printf("before call, x is %d\n", x); // print 20 * add_one(&x); printf("after call, x is %d\n", x); // print 21 return 0; } ``` .codelink[
`10-pointers-applications/params-pointer.c`
] ??? - Remember that arguments are passed by copy in C so with arguments of regular types a function cannot change its calling context - Pointers can be used to do so - For example here we have a function that takes a int pointer as parameter - Then increases the pointed value by one - We call this function by passing as parameter the address of an int variable --- template: params
??? - Let's illustrate what happens - Each function stores its local variables as well as arguments somewhere in memory - We have an area for main and an area for add_one - In main's area we have x, let's say it's located at address 0x128 - In add one's area when it is call some space is reserved for param and filled with x's address, 0x128 --- template: params
??? - Through param, add_one now has access to x's location --- template: params
??? - and can increment it --- name:return # Allow a Function to Access the Calling Context - Want a function to 'return' more than a single value? ```c // we want this function to return 3 things: the product and quotient (division) of n1 by n2, // as well as an error code in case division is impossible int multiply_and_divide(int n1, int n2, int *product, int *quotient) { if(n2 == 0) return -1; // Can't divide if n2 is 0 *product = n1 * n2; *quotient = n1 / n2; return 0; } int main(int argc, char **argv) { int p, q, a = 10, b = 5; if(multiply_and_divide(a, b, &p, &q) == 0) { printf("10*5 = %d\n", p); printf("10/5 = %d\n", q); } } ``` .codelink[
`10-pointers-applications/params-pointers.c`
] ??? - This feature is very useful when we want a function to return more than a single value - Here is a silly example in which we have a function that return both the product and the quotient of two numbers - We also want it to return an error code if the divider is 0 - So we have 3 things to return - The error or success code is returned normally - And the quotient/product are returned through pointers - It works as follows - Main reserve space for the quotient and product by creating two variables p and q - Then it passes the addresses of p and q to multiply and divide that performs the operation and write the results in p and q through the pointers --- template: return
??? - So it looks like this in memory - In main's memory we have a, b, p and q - In multiply and divide's memory, n1 and n2 are copies of a and b - More importantly, product and quotient contain the address of p and q --- template: return
??? - This allows multiply and divide to fill in the value of p and q --- name: large1 # Efficient Function Calls with Large Data Structures - Assume we want to write a function updating a large struct variable ```c typedef struct { // lots of large (8 bytes) fields: double a; double b; double c; double d; double e; double f; } large_struct; large_struct f(large_struct s) { // very inefficient in terms of performance and memory usage! s.a += 42.0; return s; } int main(int argc, char **argv) { large_struct x = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0}; large_struct y = f(x); printf("y.a: %f\n", y.a); } ``` .codelink[
`10-pointers-applications/large-param-copy.c`
] ??? - Let's see another use case for pointers - Let's assume we want to write a function that updates a large struct variable - This struct has a lot of 8-bytes fields - Without pointers we can do it like that - The function takes the struct as parameter, updates it, and returns it - In the main we call it like that --- template: large1
??? - Now let's see what happens in memory, at the beginning we have the large in main's memory - It's large 6: fields times 8 bytes so 48 bytes in total --- template: large1
??? - When f is called there is a first copy corresponding to f parameter --- template: large1
??? - Next f update the struct member --- template: large1
??? - An finally there is a second copy when we return the struct into y - This is very inefficient both in terms of performance and memory usage - The two copy operations take time because the struct is large - And in memory there are 3 instances of the struct so it takes a lot of space --- .leftcol[ ```c typedef struct { double a; double b; double c; double d; double e; double f; } large_struct; large_struct f(large_struct s) { s.a += 42.0; return s; } int main(int argc, char **argv) { large_struct x = {1, 2, 3, 4, 5, 6}; large_struct y = f(x); printf("y.a: %f\n", y.a); return 0; } ``` - Code decompiled with `objdump --source` - **Huge performance loss and memory footprint increase!** ] .rightcol[ ```asm large_struct f(large_struct s) { /* ... */ return s; 1153: 48 8b 45 f8 mov -0x8(%rbp),%rax 1157: 48 8b 55 10 mov 0x10(%rbp),%rdx 115b: 48 8b 4d 18 mov 0x18(%rbp),%rcx 115f: 48 89 10 mov %rdx,(%rax) 1162: 48 89 48 08 mov %rcx,0x8(%rax) 1166: 48 8b 55 20 mov 0x20(%rbp),%rdx 116a: 48 8b 4d 28 mov 0x28(%rbp),%rcx 116e: 48 89 50 10 mov %rdx,0x10(%rax) 1172: 48 89 48 18 mov %rcx,0x18(%rax) 1176: 48 8b 55 30 mov 0x30(%rbp),%rdx 117a: 48 8b 4d 38 mov 0x38(%rbp),%rcx 117e: 48 89 50 20 mov %rdx,0x20(%rax) 1182: 48 89 48 28 mov %rcx,0x28(%rax) } ``` ```asm int main(int argc, char **argv) { /* ... */ large_struct y = f(x); 11e9: 48 8d 45 a0 lea -0x60(%rbp),%rax 11ed: ff 75 f8 pushq -0x8(%rbp) 11f0: ff 75 f0 pushq -0x10(%rbp) 11f3: ff 75 e8 pushq -0x18(%rbp) 11f6: ff 75 e0 pushq -0x20(%rbp) 11f9: ff 75 d8 pushq -0x28(%rbp) 11fc: ff 75 d0 pushq -0x30(%rbp) 11ff: 48 89 c7 mov %rax,%rdi 1202: e8 2e ff ff ff callq 1135
1207: 48 83 c4 30 add $0x30,%rsp /* ... */ } ``` ] ??? - If we decompile this program we can see the code generated by the compiler for the function call and return operations - A lot of instructions are needed to perform the copies, so it's really inefficient --- name: large2 # Efficient Function Calls with Large Data Structures - With pointer: maintain a single copy of the variable ```c typedef struct { double a; double b; double c; double d; double e; double f;} large_struct; *void f(large_struct *s) { // now takes a pointer parameter * (*s).a += 42.0; // dereference to access x return; } int main(int argc, char **argv) { large_struct x = {1, 2, 3, 4, 5, 6}; * f(&x); // pass x's address printf("x.a: %f\n", x.a); return 0; } ``` .codelink[
`10-pointers-applications/large-param-pointer.c`
] ??? - Now let's fix that with a pointer - The goal is to maintain a unique copy of the variable and updated it in the function though a pointer, same as we saw in the previous examples - We change the parameter type to a pointer and call the function with the address of x - In the function we dereference the pointer before accessing the field --- template: large2
??? - So at runtime we have our large variable --- template: large2
??? - When the function is called we have the pointer which is a small parameter of 8 bytes --- template: large2
??? - So let's say that x is at address 0x12, that's what the pointer holds --- template: large2
??? - And in the function we can directly update the struct through the pointer - It's much more efficient, we don't have to hold multiple copies of the large struct and there is no expensive copy operations --- .leftcol[ ```c typedef struct { double a; double b; double c; double d; double e; double f; } large_struct; void f(large_struct *s) { (*s).a += 42.0; return; } int main(int argc, char **argv) { large_struct x = {1, 2, 3, 4, 5, 6}; f(&x); printf("x.a: %f\n", x.a); return 0; } ``` ] .rightcol[ ```c void f(large_struct *s) { /* ... */ return; 1159: 90 nop } 115a: 5d pop %rbp 115b: c3 retq ``` ```c int main(int argc, char **argv) { /* ... */ f(&x); 11b9: 48 8d 45 d0 lea -0x30(%rbp),%rax 11bd: 48 89 c7 mov %rax,%rdi 11c0: e8 70 ff ff ff callq 1135
/* ... */ ``` ] - **With pointers we get code that is much faster and memory efficient** ??? - If we look at the decompiled code for the function call and return operation, it's much shorter --- class: middle, center, inverse # Misc. Pointers-related Topics ??? - Let's see a few miscellaneous pointer related things --- # C Arrays are Pointers - Under the hood arrays are pointers - To pass an array as parameter or return an array, use a pointer of the array's type ```c void negate_int_array(int *ptr, int size) { // function taking pointer as parameter for(int i=0; i
`10-pointers-applications/arrays-pointers.c`
] ??? - Here is a surprise, under the hood, arrays are implemented as pointer - Arrays can be quite large and it would be very inefficient to pass them as copy - Here is a function that takes an int pointer as parameter - Notice at how we call it: we just put the name of an array variable - Also notice how we can use the square bracket on a pointer variable to address it like an array - This function negates all the elements of an int array, if we try it out we can see that it compiles and works fine --- # C Arrays are Pointers ```c int int_array[2] = {1, 2}; double double_array[2] = {4.2, 2.4}; char char_array[] = "ab"; printf("int_array[0] = %d\n", int_array[0]); printf("int_array[1] = %d\n", int_array[1]); printf("*(int_array+0) = %d\n", *(int_array+0)); // pointer arithmetic! printf("*(int_array+1) = %d\n", *(int_array+1)); // +1 means + sizeof(array type) printf("double_array[0] = %f\n", double_array[0]); printf("double_array[1] = %f\n", double_array[1]); printf("*(double_array+0) = %f\n", *(double_array+0)); printf("*(double_array+1) = %f\n", *(double_array+1)); printf("char_array[0] = %c\n", char_array[0]); printf("char_array[1] = %c\n", char_array[1]); printf("*(char_array+0) = %c\n", *(char_array+0)); printf("*(char_array+1) = %c\n", *(char_array+1)); ``` .codelink[
`10-pointers-applications/arrays-pointers-indexing.c`
]
??? - Remember that arrays elements are stored contiguously in memory - We take a few array example - Int array has 2 elements, each 4 bytes - Double array has 2 elements, each 8 bytes - And char array has 3 elements -- counting the termination character -- each 1 byte - We can refer to each elements of each array with either the square brackets, or the dereference operator - When we use the dereference operator, things work as follows - The name of the array corresponds to a pointer on the first element so dereferencing it gives you the first element - The get access to the second element we increase the pointer by 1 element, hence the plus one before dereferencing - Operations on pointers are called pointer arithmetics and you need to be careful with these - When we say int_array plus one we actually say plus one element - The amount of bytes that will be added by the compiler to the base address of int array depend on its type, in other words on the size of the elements contained in the array - For example for int array it will be 4 bytes - For double array it will be 8 bytes - And for char array 1 byte --- # Pointers and Structures ```c typedef struct { int int_member1; int int_member2; int *ptr_member; } my_struct; my_struct ms = /* ... declaration and initialisation of ms omitted for space reasons */ my_struct *p = &ms; (*p).int_member1 = 1; // don't forget the parentheses! . takes precedence over * p->int_member2 = 2; // s->x is a shortcut for (*s).x p->ptr_member = &(p->int_member2); printf("p->int_member1 = %d\n", p->int_member1); printf("p->int_member2 = %d\n", p->int_member2); printf("p->ptr_member = %p\n", p->ptr_member); printf("*(p->ptr_member) = %d\n", *(p->ptr_member)); ``` .codelink[
`10-pointers-applications/pointers-structs.c`
] --
??? - As previously seen, we often pass struct as pointers - Furthermore, structs themselves can have pointer fields - Let's take an example here with a struct having two integer parameters, member1 and member2, and an int pointer ptr_member - We can create a pointer to a previously declared struct variable - To access the int member we have two choices - The classic star operator for dereferencing followed by the point for field access - Don't forget the parentheses as the point takes precedence over the star - There is also a shortcut which is the arrow, you can use it on a struct pointer to access a field, it first dereferences the pointer then access the field - You Can get the address of an individual struct member with the ampersand operator - Here we set the pointer field equals to the address of one of the integer fields - Then we print all the members value plus we dereference the member pointer - In memory things look like that - Let's assume the struct variable is at address 0x150 - The struct pointer p points there - Remember that the members are placed in memory in order one after the other - So the address of member2 is equal to the start address plus the size of member 1 which is 4 bytes because it is an int - so it's 0x154 - and this is what is contained in the pointer member --- # Pointer Chains - A pointer is a variable and has itself an address, i.e. a location in memory - So it can be pointed to! ```c int value = 42; // integer int *ptr1 = &value; // pointer of integer printf("ptr1: %p, *ptr1: %d\n", ptr1, *ptr1); // ``` .codelink[
`10-pointers-applications/pointer-chains.c`
]
??? - One last thing: we can create pointers to pointer and construct chains - Here we have a value and we create a first pointer to it - Then we have a second pointer pointing to the first one - The type of this pointer of pointer is int star star, which mean pointer of pointer to int - We also create a pointer of pointer of pointer to int, int star star star - Next we print the pointer chains - So in memory we get something like that - The value is pointed to by ptr1, which is pointed to by ptr2, which itself is pointed to by ptr3 - Note in the code the use of several star operators for dereferencing some pointers - For example star star ptr2 means get access to the value that is pointed by what is pointed by ptr2 - In other words a first star gives us access to ptr1, and a second to val - Pointers of pointers are useful to create dynamically allocated arrays, as we will see in the next video --- # Pointer Chains - A pointer is a variable and has itself an address, i.e. a location in memory - So it can be pointed to! ```c int value = 42; // integer int *ptr1 = &value; // pointer of integer int **ptr2 = &ptr1; // pointer of pointer of integer printf("ptr1: %p, *ptr1: %d\n", ptr1, *ptr1); printf("ptr2: %p, *ptr2: %p, **ptr2: %d\n", ptr2, *ptr2, **ptr2); // ``` .codelink[
`10-pointers-applications/pointer-chains.c`
]
--- # Pointer Chains - A pointer is a variable and has itself an address, i.e. a location in memory - So it can be pointed to! ```c int value = 42; // integer int *ptr1 = &value; // pointer of integer int **ptr2 = &ptr1; // pointer of pointer of integer int ***ptr3 = &ptr2; // pointer of pointer of pointer of integer printf("ptr1: %p, *ptr1: %d\n", ptr1, *ptr1); printf("ptr2: %p, *ptr2: %p, **ptr2: %d\n", ptr2, *ptr2, **ptr2); printf("ptr3: %p, *ptr3: %p, **ptr3: %p, ***ptr3: %d\n", ptr3, *ptr3, **ptr3, ***ptr3); ``` .codelink[
`10-pointers-applications/pointer-chains.c`
]
--- # Summary - Pointers use cases - Modify a function calling context - For it to 'return' more than a single value - Avoid costly data copy on function calls - Arrays and data structures relationship with pointers - Pointer chains ---- .center[Feedback form: https://bit.ly/2WZ60QG]