The C99 \[[ISO/IEC 9899:1999|AA. C References#ISO/IEC 9899-1999]\] function {{strtok()}} is a string tokenization function that takes two arguments: an initial string to be parsed and a const-qualified character delimiter. It returns a pointer to the first character of a token, or a null pointer if there is no token.

The first time strtok() is called, the string is parsed into tokens, character delimiter. The strtok() function parses the string up to the first instance of the delimiter character, replaces the character in place with a null byte ('\0'), and returns the address of the first character in the token. Subsequent calls to strtok() begin parsing immediately after the most recently-placed null character.

Because strtok() modifies the initial string to be parsed, the string is subsequently unsafe and cannot be used in its original form. If you need to preserve the original string, copy it into a buffer and pass the address of the buffer to strtok() instead of the original string.

Noncompliant Code Example

In this example, the strtok() function is used to parse the first argument into colon-delimited tokens; it outputs each word from the string on a new line. Assume that PATH is "/usr/bin:/usr/sbin:/sbin".

char *token;
char *path = getenv("PATH");

token = strtok(path, ":");
puts(token);

while (token = strtok(0, ":")) {
  puts(token);
}

printf("PATH: %s\n", path);
/* PATH is now just "/usr/bin" */

After the loop ends, path is modified as follows: "/usr/bin\0/bin\0/usr/sbin\0/sbin\0". This is an issue because the local path variable becomes /usr/bin and because the environment variable PATH has been unintentionally changed, which can have unintended consequences (see ENV30-C. Do not modify the string returned by getenv()).

Compliant Solution

In this compliant solution the string being tokenized is copied into a temporary buffer which is not referenced after the call to strtok():

char *token;
const char *path = getenv("PATH");
/* PATH is something like "/usr/bin:/bin:/usr/sbin:/sbin" */

char *copy = (char *)malloc(strlen(path) + 1);
if (copy == NULL) {
  /* handle error */
}
strcpy(copy, path);
token = strtok(copy, ":");
puts(token);

while (token = strtok(0, ":")) {
  puts(token);
}

free(copy);
copy = NULL;

printf("PATH: %s\n", path);
/* PATH is still "/usr/bin:/bin:/usr/sbin:/sbin" */

Another possibility is to provide your own implementation of strtok() that does not modify the initial arguments.

Risk Assessment

To quote the Linux Programmer's Manual (man) page on {{strtok(3)}} \[[Linux 08|AA. C References#Linux 08]\]:
<blockquote><p>Never use this function. This function modifies its first argument. The identity of the delimiting character is lost. This function cannot be used on constant strings.</p></blockquote>The improper use of {{strtok()}} is likely to result in truncated data, producing unexpected results later in program execution.

Recommendation

Severity

Likelihood

Remediation Cost

Priority

Level

STR06-C

medium

likely

medium

P12

L1

Automated Detection

Fortify SCA Version 5.0 can detect violations of this recommendation.

Compass/ROSE can detect violations of this recommendation.

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Other Languages

This rule appears in the C++ Secure Coding Standard as STR06-CPP. Do not assume that strtok() leaves the parse string unchanged.

References

\[[ISO/IEC 9899:1999|AA. C References#ISO/IEC 9899-1999]\] Section 7.21.5.8, "The {{strtok}} function"
\[[Linux 08|AA. C References#Linux 08]\] [strtok(3)|http://www.kernel.org/doc/man-pages/online/pages/man3/strtok.3.html]
\[[MITRE 07|AA. C References#MITRE 07]\] [CWE ID 464|http://cwe.mitre.org/data/definitions/464.html], "Addition of Data Structure Sentinel"


      07. Characters and Strings (STR)      STR07-C. Use TR 24731 for remediation of existing string manipulation code