Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: REM Cost Reform

...

Generally, programs should validate UTF-8 data before performing other checks. The following table lists the well-formed UTF-8 byte sequences.

Code Points
Bits of code pointFirst code pointLast code pointBytes in sequenceByte 1
Second
Byte 2
Third
Byte 3
Fourth
Byte 4
  7U+0000
..
U+007F
00..7F   
10xxxxxxx
11U+0080
..
U+07FF
C2..DF80..BF 
2110xxxxx10xxxxxx
16
 
U+0800
..U+0FFF
E0A0..BF80..BF U+1000..U+CFFFE1..EC80..BF80..BF U+D000..U+D7FFED80..9F80..BF U+E000..U+FFFFEE..EF80..BF80..BF U+10000..U+3FFFFF090..BF80..BF80..BFU+40000..U+FFFFFF1..F380..BF80..BF80..BFU+100000..U+10FFFFF480..8F80..BF80..BF
U+FFFF31110xxxx10xxxxxx10xxxxxx
21U+10000U+1FFFFF411110xxx10xxxxxx10xxxxxx10xxxxxx

Although UTF-8 originated from the Plan 9 developers [Pike 1993], Plan 9's own support covers only the low 16-bit range. In general, many "Unicode" systems support only the low 16-bit range, not the full 21-bit ISO 10646 code space [ISO/IEC 10646:2012].

...

Failing to properly handle UTF8-encoded data can result in a data integrity violation or denial-of-service attack.

Recommendation

Severity

Likelihood

Remediation Cost

Detectable

Repairable

Priority

Level

MSC10-C

Medium

Unlikely

No

High

No

P2

L3

Automated Detection

Tool

Version

Checker

Description

LDRA tool suite
Include Page
LDRA_V
LDRA_V

176 S


, 376 S

Fully

Partially implemented

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Related Guidelines

SEI CERT C++
Secure
Coding StandardVOID MSC10-CPP. Character encoding: UTF8-related issues
MITRE CWECWE-176, Failure to handle Unicode encoding
CWE-116, Improper encoding or escaping of output

Bibliography

...


...

Image Modified Image Modified Image Modified