You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

The EOF macro represents a negative value that is used to indicate that the file is exhausted and no data remains when reading data from a file. EOF is an example of an in-band error indicator. In-band error indicators are problematic to work with, and the creation of new in-band-error indicators is discouraged in ERR02-C. Avoid in-band error indicators.

The character I/O functions fgetc(), getc(), and getchar() all read a character from a stream and return it as an int. If the stream is at the end of the file, the end-of-file indicator for the stream is set and the function returns EOF. If a read error occurs, the error indicator for the stream is set and the function returns EOF. If these functions succeed, they cast the character returned into an unsigned char. Because EOF is negative, it should not match any unsigned character value. However, this is only true for platforms where the int datatype has more precision bits than char. On a platform where int and char have the same precision, a character-reading function could return EOF because it read a valid character that had the same bit-pattern as EOF. The C Standard requires only that an int type be able to represent a maximum value of +32767 and that a char type be no larger than an int. Although uncommon, this situation can result in the integer constant expression EOF being indistinguishable from a normal character; that is, (int)(unsigned char)65535 == -1. Consequently, failing to use feof() and ferror() to detect end-of-file and file errors can result in incorrectly identifying the EOF character on rare implementations where sizeof(int) == sizeof(char).

This problem can also occur when reading wide characters. The fgetwc(), getwc(), and getwchar() functions all return a value of type wint_t. This value can represent the next wide character read, or it can represent WEOF, which indicates end-of-file for wide character streams. On most platforms, the wchar_t datatype has the same precision as wint_t, and so these functions can return WEOF because it was truly the last wide character read.

Note that in the UTF-16 character set, 0xFFFF is guaranteed not to be a character, which leaves room for WEOF to be chosen as the value −1. In 16-bit EUC (Extended UNIX Code), the high byte can never be 0xFF, so a conflict cannot occur at all. Similarly, all UTF-32 characters are positive when viewed as a signed 32-bit integer. Consequently, it would require a custom character set designed without consideration of the C programming language for this problem to occur with wide characters or with ordinary characters that are as wide as int.

See STR00-C. Represent characters using an appropriate type for more information on the proper use of character types.

C provides the feof() and ferror() to detect end-of-file and file errors. These functions are not subject to the problems associated with character and integer sizes, and are preferred over EOF or WEOF  [Kettlewell 2002].

Noncompliant Code Example

This noncompliant code example tests to see if the character c is not EOF as a loop-termination condition:

#include <stdio.h>
 
void func(void) {
  int c;
 
  do {
    c = getchar();
  } while (c != EOF);
}

Although EOF is guaranteed to be negative and distinct from the value of any unsigned character, it is not guaranteed to be different from any such value when converted to an int. Consequently, when int is the same size as char, this loop may terminate early.

Compliant Solution (Portable)

This compliant solution uses feof() to test for end-of-file and ferror() to test for errors:

#include <stdio.h>
 
void func(void) {
  int c;
 
  do {
    c = getchar();
  } while (!feof(stdin) && !ferror(stdin));
}

Noncompliant Code Example (Nonportable)

This noncompliant code example uses an assertion to ensure that the code is executed only on architectures where int is larger than char and EOF is guaranteed to not be a valid character value. See INT35-C. Use correct integer precisions for the definition of the PRECISION() macro.

However, this code example is noncompliant because the variable c is declared as a char rather than an int:

#include <assert.h>
#include <stdio.h>
 
void func(void) {
  char c;
  assert(PRECISION(UCHAR_MAX) < PRECISION(INT_MAX));

  do {
    c = getchar();
  } while (c != EOF);
}

Assuming that a char is a signed 8-bit value and an int is a 32-bit value, if getchar() returns the character encoded as 0xFF (decimal 255), it will be interpreted as EOF because this value is sign-extended to 0xFFFFFFFF (the value of EOF) to perform the comparison. (See INT31-C. Ensure that integer conversions do not result in lost or misinterpreted data.)

Compliant Solution (Nonportable)

This compliant solution declares c to be an int. Consequently the loop will only terminate when the file is exhausted.

#include <assert.h>
#include <stdio.h>
 
void func(void) {
  int c;
  assert(PRECISION(UCHAR_MAX) < PRECISION(INT_MAX));

  do {
    c = getchar();
  } while (c != EOF);
}

Noncompliant Code Example (Wide Characters)

In this noncompliant example, the result of the call to the C standard library function getwc() is stored into a variable of type wchar_t, and is subsequently compared with WEOF:

#include <stddef.h>
#include <stdio.h>
#include <wchar.h>
 
void g(void) {
  enum { BUFFER_SIZE = 32 };
  wchar_t buf[BUFFER_SIZE];
  wchar_t wc;
  size_t i = 0;
  
  while ((wc = getwc(stdin)) != L'\n' && wc != WEOF) {
    if (i < (BUFFER_SIZE - 1)) {
      buf[i++] = wc;
    }
  }
  
  buf[i] = L'\0';
}

This code suffers from two problems. First, the value returned by getwc() is immediately converted to wchar_t before being compared with WEOF. Second, there is no check to see if wint_t has more precision bits than wchar_t. Both of these problems make it possible for an attacker to terminate the loop prematurely by supplying the wide-character value matching WEOF in the file.

Compliant Solution (Portable)

This compliant solution declares c to be a wint_t, the type of integer returned by getwc(). Furthermore, it does not rely on WEOF to determine end-of-file.

#include <stddef.h>
#include <stdio.h>
#include <wchar.h>
 
void g(void) {
  enum { BUFFER_SIZE = 32 };
  wchar_t buf[BUFFER_SIZE];
  wint_t wc;
  size_t i = 0;
  
  while ((wc = getwc(stdin)) != L'\n' &&
         !feof(stdin) && !ferror(stdin)) {
    if (i < BUFFER_SIZE - 1) {
      buf[i++] = wc;
    }
  }
  
  buf[i] = L'\0';
}

Exceptions

FIO34-EX1: A number of C functions do not return characters but can return EOF as a status code. These functions include fclose(), fflush(), fputs(), fscanf(), puts(), scanf(), sscanf(), vfscanf(), and vscanf(). It is valid to test these return values with EOF.

Risk Assessment

Historically, using a char type to capture the return value of character I/O functions has resulted in significant vulnerabilities, including command injection attacks. (See the *CA-1996-22 advisory.) As a result, the severity of this error is high.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

FIO34-C

High

Probable

Medium

P12

L1

 

Automated Detection

Tool

Version

Checker

Description

Compass/ROSE

 

 

 

Coverity

6.5

CHAR_IO

Identifies defects when the return value of fgetc()getc(), or getchar() is incorrectly assigned to a charinstead of an int. Coverity Prevent cannot discover all violations of this rule, so further verification is necessary

ECLAIR

1.2

CC2.FIO34

Partially implemented

Fortify SCA

5.0

 

Can detect violations of this rule with CERT C Rule Pack

Splint

3.1.1

 

 

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Related Guidelines

Bibliography

[Kettlewell 2002]Section 1.2, "<stdio.h> and Character Types"
[NIST 2006]SAMATE Reference Dataset Test Case ID 000-000-088
[Summit 2005]Question 12.2

 


  

  • No labels