Page History

...

The classical US-ASCII characters (0 to 0x7f) encode as themselves, so files and strings that are encoded with ASCII values have the same encoding under both ASCII and UTF-8.
All UCS characters beyond (0x7f) are encoded as a multibyte sequence consisting only of bytes in the range of 0x80 to 0xfd. This means that no ASCII byte (including a null NULL byte) can appear as part of another character. This property supports the use of string handling functions.
It's easy to convert between UTF-8 and UCS-2 and UCS-4 fixed-width representations of characters.
The lexicographic sorting order of UCS-4 strings is preserved.
All possible 2^31 UCS codes can be encoded using UTF-8

...

Implementors of UTF-8 need to consider the security aspects of how they handle invalid UTF-8 sequences. It is conceivable that in some circumstances an attacker would be able to exploit an incautious UTF-8 parser by sending it an octet sequence that is not permitted by the UTF-8 syntax.
A particularly subtle form of this attack could be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain invalid octet sequences as characters. For example, a parser might prohibit the null NULL character when encoded as the single-octet sequence 00, but allow the invalid two-octet sequence C0 80 and interpret it as a null NULL character. Another example might be a parser which prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the invalid octet sequence 2F C0 AE 2E 2F.

...

Recommendation	Severity	Likelihood	Remediation Cost	Priority	Level
MSC10-A	2 ( medium )	1 ( unlikely ) 1 (	high )	P2	L3

Automated Detection

The LDRA tool suite V 7.6.0 is able to detect violations of this recommendation.

...

Space shortcuts

Page tree

Versions Compared

Old Version 30

New Version 31

Key

Automated Detection