Type Conversion and Casting
You can access the slides 🖼️ for this lecture. All the code samples given here can be found online, alongside instructions on how to bring up the proper environment to build and execute them here. You can download the entire set of slides and lecture notes in PDF from the home page.
Here we discuss type conversion and casting in C.
Implicit Type Conversions
In many situations the compiler performs implicit conversions between types. With arithmetic operations, it applies integer promotion, converting smaller types to larger ones. Consider this example:
char char_var = 12; // 1 byte, -128 -to 127
int int_var = 1000; // 4 bytes, -2*10^9 to 2*10^9
long long ll_var = 0x840A1231AC154; // 8 bytes, -9*10^18 to 9*10^18
// here, char_var is first promoted to int, then the result of the first
// addition is promoted to long long:
long long result = (char_var + int_var) + ll_var;
We declare signed integers of various sizes.
We have a char
, note that we can store integers in chars, on x86-64 char
size is one byte so it can store only 256 values.
We also have a regular int
, on x86-64 its size is 4 bytes so it can store about 4 billions values.
And we also have a long long int
, on x86-64 its size is 8 bytes, so it can store much more numbers.
Considering the operation computing the value of result
, when the char
is first added to the int
, the char
is automatically promoted to an int
, and what is between parentheses evaluates to an int
.
This value is then promoted to a long long
because it's added with a long long
, and the resulting value is a long long
.
We can store it in a long long
variable (result
) without fear of loosing precision or data truncation due to the intermediary operations and types.
These implicit conversions, with an operation applied to integers of different types, are called integer promotion.
Integer Promotion
The key idea behind integer promotion is that no information is lost when going from smaller to larger types. Integer types are given ranks. In decreasing order of rank we have:
long long int
,unsigned long long int
(highest rank).long int
,unsigned long int
.int
,unsigned int
.short int
,unsigned short int
.signed char
,char
,unsigned char
(lowest rank).
The promotion rules for 2 operands of an operation are as follows, in order:
- If the operands have the same type there is no need for promotion.
- If both operands are signed or both operands are unsigned, the operand of lesser rank is promoted to the type of the operand of higher rank.
- If the rank of the unsigned operand is superior or equal to the rank of the other operand, the signed operand is promoted to the type of the unsigned operand.
- If the signed operand type can represent all the values of the unsigned operand type, the unsigned operand gets promoted to the signed type.
- Otherwise, both operands are converted to the unsigned type corresponding to the signed operand's type.
It is important to keep these rules in mind, otherwise bugs may arise when mixing signed and unsigned integers. Consider this example:
int si = -1;
unsigned int ui = 1;
printf("%d\n", si < ui); // prints 0! si converted to unsigned int
We have a comparison between a signed int
which is -1
and an unsigned int
.
According to the rules the signed int
gets converted to an unsigned int
.
And the binary representation of -1
in memory when interpreted as an unsigned int
, corresponds to the maximum value that can be stored in an unsigned int, which is about 4 billions
So the expression evaluates to false, which is 0 in C.
This is very counter-intuitive.
Integer Overflows
It is important to keep in mind the storage size of all types when mixing them in operations. Here's another example of behaviour that can be surprising:
int main(int argc, char **argv) {
int i = -1000;
unsigned int ui = 4294967295;
printf("%d\n", i + ui); // prints -1001
// i: 11111111111111111111110000011000 (A)
// ui: 11111111111111111111111111111111 (B)
// i + ui: 111111111111111111111110000010111 (C)
// final: 11111111111111111111110000010111 (D)
// (A) originally 2's complement promoted to unsigned
// (B) standard unsigned representation, max number an unsigned int can store
// (C) addition result
// (D) result overflows 32 bits as an int (expected by %d), loosing MSB
// Solution: use %ld rather than %d to store the result on 64 bits
return 0;
}
We add a signed int
, i
, equal to -1000
, to an unsigned int
, ui
, that is actually the maximum value we can store with that type.
i
is promoted to an unsigned int
, and because signed numbers are encoded with 2's complement we have the leading ones in the binary representation.
ui
being the maximum number that can be stored in an unsigned int
, it is 32 bits full of ones.
The addition overflows and the most significant bit gets truncated.
What we obtain is the binary representation of -1001.
Integer to Floating-Point Conversion
Regarding floating point numbers, if an operand is float
/double
, the other gets converted to float
/double
.
This is another implicit conversion realised by the compiler.
Conversion spreads from left to right.
See the following examples with explanation in comments:
//prints 7: 25/10 rounds to 2; 2 * 15 = 30; 30/4 rounds to 7
printf("%d\n", 25 / 10 * 15 / 4);
// prints 7.5: 25/10 rounds to 2; 2*15 = 30; 30 gets converted to
// 30.0 (double) and divided by 4.0 (double) giving result 7.5 (double)
printf("%lf\n", 25 / 10 * 15 / 4.0);
// prints 9.375: 25.0 / 10.0 (converted from 10) is 2.5, multiplied by 15.0
// (converted from 15) gives 37.5, divided by 4.0 (converted from 4) gives
// 9.375
printf("%lf\n", 25.0 / 10 * 15 / 4);
// prints garbage, don't try to interpret a double as an int!
printf("%d\n", 25.0 / 10 * 15 / 4);
Implicit Conversion When Passing Parameters
Conversion also happens implicitly when calling functions. Consider this example:
void f1(int i) {
printf("%d\n", i);
}
void f2(double d) {
printf("%lf\n", d);
}
void f3(unsigned int ui) {
printf("%u\n", ui);
}
int main(int argc, char **argv) {
char c = 'a';
unsigned long long ull = 0x400000000000;
f1(c); // prints 97 (ascii code for 'a')
f2(c); // prints 97.0
f3(ull); // overflows int ... prints 0 (lower 32 bits of 0x400000000000)
return 0;
}
Here we have a character variable c
containing 'a'
, passed as parameter to functions expecting int
(f1
) and double (f2
).
Characters are encoded with ascii code, so when this data is interpreted as an int
or a double
, we get the ascii code for 'a'
: 97 or 97.0.
We also have a long long unsigned
variable, ull
, that is passed to a function expecting an unsigned int
, but ull
is too large for that.
So when printed within the function body we only see its lower 32 bits.
Note that this code, and all the faulty programs presented in this lecture, is perfectly legit from the compiler point of view: it will produce no warnings. Hence it is really important to be aware of the various implicit conversion rules to avoid bugs, some of which may be quite hard to investigate and fix.
Type Casting
Type casting lets the programmer force a conversion. It is achieved by writing the target type between parentheses in front of the expression one wish to covert. For example here we cast an int into a float:
// prints 3.75: 4 gets converted to 4.0
printf("%lf\n", (float)15/4);
Here we cast a floating point number into an integer:
// prints 4: 2.5 converted to 2 (int), multiplied by 12 gives 24, divided by 5 gives 4
printf("%d\n", ((int)2.5 * 12)/5);
Another example:
// prints 4.8: 2*12 = 24, converted to 24.0, divided by 5.0 gives 4.8
printf("%lf\n", ((int)2.5 * 12)/(double)5);
Recall the example above in which an unsigned int
equal to 1
was evaluated as not inferior to an int
equal to -1
due to integer promotion.
This can be fixed that with a cast:
int si = -1;
unsigned int ui = 1;
printf("%d\n", si < (int)ui); // prints 1
We force the unsigned variable to be converted to a signed one, and we now compare two signed int
.
Generic Pointers
In combination with the special type void *
, casting also allows us to implement generic pointers in C.
Generic pointers allow a pointer parameter or return value to point to data of different types.
Consider the following code:
typedef enum {
CHAR, INT, DOUBLE
} type_enum;
void print(void *data, type_enum t) {
switch(t) {
case CHAR:
printf("character: %c\n",
*(char *)data);
break;
case INT:
printf("integer: %d\n",
*(int *)data);
break;
case DOUBLE:
printf("double: %lf\n",
*(double *)data);
break;
default:
printf("Unknown type ...\n");
}
}
We have a function that takes a void *`` generic pointer as parameter. According to a second parameter that is an
enum, the function prints the value of what is pointed by the pointer. It can be a
char, an
int, or a
double. When we call the function, we cast the pointers to these data types to
void *. Of course, we need to indicate the type somewhere, in this case we use the
enum`.
When printing the value, we use another cast to get the proper type.