Understand integer conversion rules

Type conversions occur explicitly in C and C++ as the result of a cast or implicitly
as required by an operation. While conversions are generally required for
the correct execution of a program, they can also lead to lost or misinterpreted
data. This section describes how and when conversions are performed and
identifies their pitfalls.

Implicit conversions are, in part, a consequence of the C language ability
to perform operations on mixed types. For example, most C programmers
would not think twice before adding an unsigned char to a signed char and
storing the result in a short int. This is because the C compiler generates the
code required to perform the required conversions implicitly.

The C99 standard rules define how C compilers handle conversions. These
rules, which are described in the following sections, include integer promotions,
integer conversion rank, and usual arithmetic conversions.

Integer Promotions

Integer types smaller than int are promoted when an operation is performed on
them. If all values of the original type can be represented as an int, the value of
the smaller type is converted to an int; otherwise, it is converted to an
unsigned int.

Integer promotions are applied as part of the usual arithmetic conversions
(discussed later in this section) to certain argument expressions, operands of
the unary +, --, and ~ operators, and operands of the shift operators. The following
code fragment illustrates the use of integer promotions:

char c1, c2;
c1 = c1 + c2;

Integer promotions require the promotion value of each variable (c1 and
c2) to int size. The two ints are added and the sum truncated to fit into the
char type.

Integer promotions are performed to avoid arithmetic errors resulting from
the overflow of intermediate values. On line 5 of Figure 5--7, the value of c1 is
added to the value of c2. The sum of these values is then added to the value of
c3 (according to operator precedence rules). The addition of c1 and c2 would
result in an overflow of the signed char type because the result of the operation
exceeds the maximum size of signed char. Because of integer promotions,
however, c1, c2, and c3 are each converted to integers and the overall expression
is successfully evaluated. The resulting value is then truncated and stored
in cresult. Because the result is in the range of the signed char type, the truncation
does not result in lost data.

Integer Conversion Rank

Every integer type has an integer conversion rank that determines how conversions
are performed. The following rules for determining integer conversion
rank are defined in C99.

No two different signed integer types have the same rank, even if they
have the same representation.
The rank of a signed integer type is greater than the rank of any signed
integer type with less precision.
The rank of long long int is greater than the rank of long int, which
is greater than the rank of int, which is greater than the rank of short
int, which is greater than the rank of signed char.
The rank of any unsigned integer type is equal to the rank of the corresponding
signed integer type, if any.
The rank of any standard integer type is greater than the rank of any
extended integer type with the same width.
The rank of char is equal to the rank of signed char and unsigned
char.
The rank of any extended signed integer type relative to another
extended signed integer type with the same precision is implementation
defined but still subject to the other rules for determining the integer
conversion rank.
For all integer types T1, T2, and T3, if T1 has greater rank than T2 and
T2 has greater rank than T3, then T1 has greater rank than T3.

char cresult, c1, c2, c3;
c1 = 100;
c2 = 90;
c3 = --120;
cresult = c1 + c2 + c3;

Figure 5--7. Preventing arithmetic errors with implicit conversions

The integer conversion rank is used in the usual arithmetic conversions to
determine what conversions need to take place to support an operation on
mixed integer types.

Usual Arithmetic Conversions

Many operators that accept arithmetic operands perform conversions using the
usual arithmetic conversions. After integer promotions are performed on both
operands, the following rules are applied to the promoted operands.

If both operands have the same type, no further conversion is needed.
If both operands are of the same integer type (signed or unsigned), the
operand with the type of lesser integer conversion rank is converted to
the type of the operand with greater rank.
If the operand that has unsigned integer type has rank greater than or
equal to the rank of the type of the other operand, the operand with
signed integer type is converted to the type of the operand with
unsigned integer type.
If the type of the operand with signed integer type can represent all of
the values of the type of the operand with unsigned integer type, the
operand with unsigned integer type is converted to the type of the
operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
Specific operations can add to or modify the semantics of the usual arithmetic
operations.

Example

For example, assume the following code is compiled and executed on IA-32:

signed char sc = SCHAR_MAX;
unsigned char uc = UCHAR_MAX;
signed long long sll = sc + uc;

Both the signed char sc and the unsigned char uc are subject to integer promotions in this example. Because all values of the original types can be represented as int, both values are automatically converted to int as part of the integer promotions. Further conversions are possible, if the types of these variables are not equivalent as a result of the "usual arithmetic conversions". The actual addition operation in this case takes place between the two 32-bit int values. This operation is not influenced by the resulting value is stored in a signed long long integer. The 32-bit value resulting from the addition is simply sign-extended to 64-bits after the addition operation has concluded.

Assuming that the precision of signed char is 7 bits, and the precision of unsigned char is 8 bits, this operation is perfectly safe. However, if the compiler represents the signed char and unsigned char types using 31 and 32 bit precision (respectively), the variable uc would need be converted to unsigned int instead of signed int. As a result of the usual arithmetic conversions, the signed int is converted to unsigned and the addition takes place between the two unsigned int values. Also, because uc is equal to UCHAR_MAX which is equal to UINT_MAX in this example, the addition will result in an overflow. The resulting value is then zero-extended to fit into the 64-bit storage allocated by sll.

Non-compliant Code Example 1

In the following non-compliant code example, cBlocks is multiplied by 16 and the result is stored in the unsigned long long int alloc. The result of this multiplication can overflow because it is a 32 bit operation and the resulting value stored in alloc invalid.

void* AllocBlocks(size_t cBlocks) {
  if (cBlocks == 0) return NULL;
  unsigned long long alloc = cBlocks * 16;
  return (alloc < UINT_MAX)
    ? malloc(cBlocks * 16)
    : NULL;
}

Compliant Solution 1

On architectures where unsigned long long int is guaranteed to have 2x the number of bits as size_tupcast the variable used in the multiplication to a 64-bit value. This ensures the multiplication operation is performed

void* AllocBlocks(size_t cBlocks) {
  if (cBlocks == 0) return NULL;
  unsigned long long alloc =
           (unsigned long long)cBlocks*16;
  return (alloc < UINT_MAX)
    ? malloc(cBlocks * 16)
    : NULL;
}

The assumption concerning the relationship of unsigned long long int and size_t must be document in the header for each file that depends upon this assumption for correct execution.

Exceptions

Unsigned integers can be allowed to exhibit modulo behavior if and only if

the variable declaration is clearly commented as supporting modulo behavior
each operation on that integer is also clearly commented as supporting modulo behavior
if the integer exhibiting modulo behavior contributes to the value of an integer not marked as exhibiting modulo behavior, the resulting integer must obey this rule.

Consequences

Improper range checking can lead to buffer overflows and the execution of arbitary code by an attacker.

References

Seacord 05 Chapter 5 Integers

Space shortcuts

Page tree