Strings are a fundamental concept in software engineering, but they are not a built-in type in C. Null-terminated byte strings (NTBS) consist of a contiguous sequence of characters terminated by and including the first null character. The C programming language supports the following types of null-terminated byte strings: single-byte character strings, multibyte character strings, and wide character strings. Single-byte and multibyte character strings are both described as null-terminated byte strings.
A pointer to a single-byte or multibyte character string points to its initial character. The length of the string is the number of bytes preceding the null character, and the value of the string is the sequence of the values of the contained characters, in order.
A wide string is a contiguous sequence of wide characters terminated by and including the first null wide character. A pointer to a wide string points to its initial (lowest addressed) wide character. The length of a wide string is the number of wide characters preceding the null wide character, and the value of a wide string is the sequence of code values of the contained wide characters, in order.
Null-terminated byte strings are implemented as arrays of characters and are susceptible to the same problems as arrays. As a result, rules and recommendations for arrays should also be applied to null-terminated byte strings.
The C standard uses the general philosophy outlined below for choosing character types, though it is not explicitly stated in one place.
signed char and unsigned charcharintEOF (a negative value) or character data interpreted as unsigned char and then converted to int. Therefore, returned by fgetc(), getc(), getchar(), and ungetc(). Also, accepted by the character handling functions from <ctype.h>, because they might be passed the result of fgetc() et al.char converted to int.unsigned charchar is signed.fwrite().Note that the two different ways a character is used as an int (as an unsigned char + EOF, or as a plain char, converted to int) can lead to confusion. For example, isspace('\200') results in undefined behavior when char is signed.
Understanding how to represent strings can eliminate many common programming errors that lead to software vulnerabilities.
Recommendation |
Severity |
Likelihood |
Remediation Cost |
Priority |
Level |
|---|---|---|---|---|---|
STR00-A |
medium |
probable |
low |
P12 |
L1 |
Search for vulnerabilities resulting from the violation of this rule on the CERT website.
\[[ISO/IEC TR 24731-1-2007|AA. C References#ISO/IEC TR 24731-1-2007]\] \[[ISO/IEC 9899-1999|AA. C References#ISO/IEC 9899-1999]\] Section 7.1.1, "Definitions of terms," and Section 7.21, "String handling <string.h>" \[[Seacord 05a|AA. C References#Seacord 05a]\] Chapter 2, "Strings" \[[Seacord 05b|AA. C References#Seacord 05b]\] |
07. Characters and Strings (STR) 07. Characters and Strings (STR) STR01-A. Use managed strings for development of new string manipulation code