Page History

...

Generally, programs should validate UTF-8 data before performing other checks. The following table lists the well-formed UTF-8 byte sequences.

Code Points

Bits of code point	First code point	Last code point	Bytes in sequence	Byte 1

Second

Byte 2

Third

Byte 3

Fourth

Byte 4
7	U+0000

..

U+007F

00..7F

1	`0xxxxxxx`
11	U+0080

..

U+07FF

C2..DF80..BF

16
2	`110xxxxx`	`10xxxxxx`

	U+0800

..U+0FFF

E0A0..BF80..BF U+1000..U+CFFFE1..EC80..BF80..BF U+D000..U+D7FFED80..9F80..BF U+E000..U+FFFFEE..EF80..BF80..BF U+10000..U+3FFFFF090..BF80..BF80..BFU+40000..U+FFFFFF1..F380..BF80..BF80..BFU+100000..U+10FFFFF480..8F80..BF80..BF

U+FFFF	3	`1110xxxx`	`10xxxxxx`	`10xxxxxx`
21	U+10000	U+1FFFFF	4	`11110xxx`	`10xxxxxx`	`10xxxxxx`	`10xxxxxx`

Although UTF-8 originated from the Plan 9 developers [Pike 1993], Plan 9's own support covers only the low 16-bit range. In general, many "Unicode" systems support only the low 16-bit range, not the full 21-bit ISO 10646 code space [ISO/IEC 10646:2012].

...

Failing to properly handle UTF8-encoded data can result in a data integrity violation or denial-of-service attack.

Recommendation	Severity	Likelihood

Remediation Cost

Detectable	Repairable	Priority	Level
MSC10-C	Medium	Unlikely	No

High

No

P2

L3

Automated Detection

Tool

Version

Checker

Description

LDRA tool suite

Include Page

	LDRA_V
	LDRA_V

176 S

, 376 S

Fully

Partially implemented

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Related Guidelines

SEI CERT C++

Secure

Coding Standard	VOID MSC10-CPP. Character encoding: UTF8-related issues
MITRE CWE	CWE-176, Failure to handle Unicode encoding CWE-116, Improper encoding or escaping of output

Bibliography

[ISO/IEC 10646:2012]


[Kuhn 2006]	UTF-8 and Unicode FAQ for Unix/Linux
[Pike 1993]	"Hello World"
[Unicode 2006]


[Viega 2003]	Section 3.12, "Detecting Illegal UTF-8 Characters"
[Wheeler 2003]	Secure Programmer: Call Components Safely
[Yergeau 1998]	RFC 2279

...

Image Modified Image Modified Image Modified

Space shortcuts

Page tree

Versions Compared

Old Version 71

New Version Current

Key

Automated Detection

Related Vulnerabilities

Related Guidelines

Bibliography