Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

C99 supports The C Standard supports universal character names that may be used in identifiers, character constants, and string literals to designate characters that are not in the basic character set. The universal character name \Unnnnnnnn designates the character whose eight8-digit short identifier (as specified by ISO/IEC 10646) is nnnnnnnn. Similarly, the universal character name \unnnn designates the character whose four4-digit short identifier is nnnn (and whose eight8-digit short identifier is 0000nnnn).

The C Standard, 5.1.1.2, paragraph 4 [ISO/IEC 9899:2024], says

If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.5.3), the behavior is undefined.

See also undefined behavior 3.

In general, avoid universal character names should be avoided in identifiers unless absolutely necessary. The basic character set should suffice for almost every identifier.

Noncompliant Code Example

This code example is noncompliant because it produces a universal character name by token concatenation.:

Code Block
bgColor#FFCCCC
langc

#define assign(uc1, uc2, uc3, uc4, val) \
  uc1##uc2##uc3##uc4uc1##uc2 = val;



void func(void) {
  int \U00010401\U00010401\U00010401\U00010402;
u0401;
  /* ... */
  assign(\U00010401u04, \U00010401, \U00010401, \U00010402, 4);
01, 4);
  /* ... */
}

Implementation Details

This code compiles and runs with Microsoft Visual Studio 2013, assigning 4 to the variable as expected.

GCC 4.8.1 on Linux refuses to compile this code; it emits a diagnostic reading, "stray '\' in program," referring to the universal character fragment in the invocation of the assign macro.

Compliant Solution

This code solution is compliant.compliant solution uses a universal character name but does not create it by using token concatenation:

Code Block
bgColor#ccccff
langc

#define assign(ucn, val) ucn = val;


 
void func(void) {
  int \U00010401\U00010401\U00010401\U00010402;
assign(\U00010401\U00010401\U00010401\U00010402u0401;
  /* ... */
  assign(\u0401, 4);
  /* ... */
}

Risk Assessment

Creating a universal character name through token concatenation results in undefined behavior. See undefined behavior 3.

Rule

Severity

Likelihood

Detectable

Remediation Cost

Repairable

Priority

Level

PRE30-C

low

Low

Unlikely

unlikely

Yes

medium

No

P2

L3

Automated Detection

Tool

Version

Checker

Description

Astrée
Include Page
Astrée_V
Astrée_V
universal-character-name-concatenation
Fully implemented

Axivion Bauhaus Suite

Include Page
Axivion Bauhaus Suite_V
Axivion Bauhaus Suite_V

CertC-PRE30Fully implemented
CodeSonar
Include Page
CodeSonar_V
CodeSonar_V
LANG.PREPROC.PASTE
LANG.PREPROC.PASTEHASH
Macro uses ## operator
## follows # operator
Cppcheck

Include Page
Cppcheck_V
Cppcheck_V

preprocessorErrorDirective
Cppcheck Premium

Include Page
Cppcheck Premium_V
Cppcheck Premium_V

preprocessorErrorDirective
Helix QAC

Include Page
Helix QAC_V
Helix QAC_V

C0905 

C++0064,C++0080

Fully implemented
Klocwork

Include Page
Klocwork_V
Klocwork_V

MISRA.DEFINE.SHARP

Fully implemented
LDRA tool suite
Include Page
LDRA_V
LDRA_V

573 S

Fully implemented

Parasoft C/C++test

Include Page
Parasoft_V
Parasoft_V

CERT_C-PRE30-aAvoid token concatenation that may produce universal character names

Polyspace Bug Finder

Include Page
Polyspace Bug Finder_V
Polyspace Bug Finder_V

CERT C: Rule PRE30-CChecks for universal character name from token concatenation (rule fully covered)
RuleChecker
Include Page
RuleChecker_V
RuleChecker_V
universal-character-name-concatenation
Fully checked
Security Reviewer - Static Reviewer

Include Page
Security Reviewer - Static Reviewer_V
Security Reviewer - Static Reviewer_V

RTOS_27Fully implemented

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

References

...

Bibliography

...

...

...

...

2024]Subclause 5.1.1.2,

...

"Translation Phases"


...

Image Added Image Added Image Added phases," Section 6.4.3, "Universal character names," and Section 6.10.3.3, "The ## operator"PRE10-C. Wrap multi-statement macros in a do-while loop      01. Preprocessor (PRE)       PRE31-C. Never invoke an unsafe macro with arguments containing assignment, increment, decrement, volatile access, or function call - CERT Secure Coding Standards