You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 124 Next »

Strings are a fundamental concept in software engineering, but they are not a built-in type in C. Null-terminated byte strings (NTBS) consist of a contiguous sequence of characters terminated by and including the first null character. The C programming language supports the following types of null-terminated byte strings: single-byte character strings, multibyte character strings, and wide character strings. Single-byte and multibyte character strings are both described as null-terminated byte strings.

A pointer to a single-byte or multibyte character string points to its initial character. The length of the string is the number of bytes preceding the null character, and the value of the string is the sequence of the values of the contained characters, in order.

A wide string is a contiguous sequence of wide characters terminated by and including the first null wide character. A pointer to a wide string points to its initial (lowest addressed) wide character. The length of a wide string is the number of wide characters preceding the null wide character, and the value of a wide string is the sequence of code values of the contained wide characters, in order.

Null-terminated byte strings are implemented as arrays of characters and are susceptible to the same problems as arrays. As a result, rules and recommendations for arrays should also be applied to null-terminated byte strings.

The C standard uses the general philosophy outlined below for choosing character types, though it is not explicitly stated in one place.

signed char and unsigned char

  • Suitable for small integer values

"plain" char

  • The type of each element of a string literal.
  • Used for character data (where signedness has little meaning) as opposed to integer data.

int

  • Used for data that could be either EOF (a negative value) or character data interpreted as unsigned char and then converted to int.  Therefore, returned by fgetc(), getc(), getchar(), and ungetc().  Also, accepted by the character handling functions from <ctype.h>, because they might be passed the result of fgetc() et al.
  • The type of a character constant.  Its value is that of a plain char converted to int.

unsigned char

  • Used internally for string comparison functions, even though these operate on character data.  Therefore, the result of a string comparison does not depend on whether plain char is signed.
  • Used for situations where the object being manipulated might be of any type, and it is necessary to access all bits of that object, as with fwrite().

Note that the two different ways a character is used as an int (as an unsigned char + EOF, or as a plain char, converted to int) can lead to confusion.  For example, isspace('\200') results in undefined behavior when char is signed.

Recommendations

STR00-A. Use TR 24731 for remediation of existing string manipulation code

STR01-A. Use managed strings for development of new string manipulation code

STR02-A. Sanitize data passed to complex subsystems

STR03-A. Do not inadvertently truncate a null-terminated byte string

STR04-A. Use plain char for character data

STR05-A. Prefer making string literals const-qualified

STR06-A. Do not assume that strtok() leaves the parse string unchanged

Rules

STR30-C. Do not attempt to modify string literals

STR31-C. Guarantee that storage for strings has sufficient space for character data and the null terminator

STR32-C. Null-terminate byte strings as required

STR33-C. Size wide character strings correctly

STR34-C. Cast characters to unsigned types before converting to larger integer sizes

STR35-C. Do not copy data from an unbounded source to a fixed-length array

Risk Assessment Summary

Recommendation

Severity

Likelihood

Remediation Cost

Priority

Level

STR00-A

high

probable

medium

P12

L1

STR01-A

high

probable

high

P6

L2

STR02-A

medium

likely

medium

P12

L1

STR03-A

low

unlikely

medium

P2

L3

STR05-A

low

unlikely

high

P1

L3

STR06-A

medium

probable

low

P12

L1

STR07-A

low

unlikely

medium

P2

L3

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

STR30-C

low

likely

low

P9

L2

STR31-C

high

likely

medium

P18

L1

STR32-C

high

probable

medium

P12

L1

STR33-C

high

likely

medium

P18

L1

STR34-C

medium

probable

medium

P8

L2

STR35-C

high

likely

medium

P18

L1

Related Rules and Recommendations

References

[[ISO/IEC 9899-1999]] Section 7.1.1, "Definitions of terms," and Section 7.21, "String handling <string.h>"
[[Seacord 05]] Chapter 2, "Strings"


ARR38-C. Do not add or subtract an integer to a pointer if the resulting value does not refer to an element within the array      06. Arrays (ARR)       STR00-A. Use TR 24731 for remediation of existing string manipulation code

  • No labels