
C99 supports The C Standard supports universal character names that may be used in identifiers, character constants, and string literals to designate characters that are not in the basic character set. The universal character name \U
nnnnnnnn designates the character whose eight8-digit short identifier (as specified by ISO/IEC 10646) is nnnnnnnn. Similarly, the universal character name \u
nnnn designates the character whose four4-digit short identifier is nnnn (and whose eight8-digit short identifier is 0000
nnnn).
C99The C Standard, Section 5.1.1.2, paragraph 4 paragraph 4 [ISO/IEC 9899:2024], says:
If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.35.3), the behavior is undefined.
See also undefined behavior 3.
In general, avoid universal character names should be avoided in identifiers unless absolutely necessary. The basic character set should suffice for almost every identifier.
Noncompliant Code Example
This code example is noncompliant because it produces a universal character name by token concatenation.:
Code Block | ||||
---|---|---|---|---|
| ||||
#define assign(uc1, uc2, val) uc1##uc2 = val; void func(void) { int \u0401; /* ... */ assign( \u04, 01, 4); /* ... */ } |
Implementation Details
This code compiles and runs on runs with Microsoft Visual C++ 2008Studio 2013, assigning 4 to the variable as expected.
GCC 4.8.3 1 on Linux refuses to compile this code; it complains of emits a diagnostic reading, "stray '\' in program," , referring to the universal character fragment in the invocation of the assign
macro.
Compliant Solution
This code solution is compliant.compliant solution uses a universal character name but does not create it by using token concatenation:
Code Block | ||||
---|---|---|---|---|
| ||||
#define assign(ucn, val) ucn = val; void func(void) { int \u0401; /* ... */ assign( \u0401, 4); /* ... */ } |
Risk Assessment
Creating a universal character name through token concatenation results in undefined behavior. See undefined behavior 3.
Rule | Severity | Likelihood | Detectable |
---|
Repairable | Priority | Level |
---|---|---|
PRE30-C | Low |
Unlikely |
Yes |
No | P2 | L3 |
Automated Detection
Tool | Version | Checker | Description | ||||||
---|---|---|---|---|---|---|---|---|---|
Astrée |
| universal-character-name-concatenation | Fully implemented | ||||||
| CertC-PRE30 | Fully implemented | |||||||
CodeSonar |
| LANG.PREPROC.PASTE LANG.PREPROC.PASTEHASH | Macro uses ## operator## follows # operator | ||||||
Cppcheck |
| preprocessorErrorDirective | |||||||
Cppcheck Premium |
| preprocessorErrorDirective | |||||||
Helix QAC |
| C0905 C++0064,C++0080 | Fully implemented | ||||||
Klocwork |
| MISRA.DEFINE.SHARP | Fully implemented | ||||||
LDRA tool suite |
| 573 S | Fully implemented | ||||||
Parasoft C/C++test |
| CERT_C-PRE30-a | Avoid token concatenation that may produce universal character names | ||||||
| CERT C: Rule PRE30-C | Checks for universal character name from token concatenation (rule fully covered) | |||||||
RuleChecker |
| universal-character-name-concatenation | Fully checked | ||||||
Security Reviewer - Static Reviewer |
| RTOS_27 | Fully implemented |
Related Vulnerabilities
Search for vulnerabilities resulting from the violation of this rule on the CERT website.
Other Languages
This rule appears in the C++ Secure Coding Standard as PRE30-CPP. Do not create a universal character name through concatenation.
References
...
Bibliography
...
[ |
...
...
...
2024] | Subclause 5.1.1.2, "Translation Phases" |
...
"Translation phases," Section 6.4.3, "Universal character names," and Section 6.10.3.3, "The ## operator" 01. Preprocessor (PRE) PRE31-C. Never invoke an unsafe macro with arguments containing assignment, increment, decrement, volatile access, or function call