Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Edited by sciSpider (sch jbop) (X_X)@==(Q_Q)@

...

  • The classical US-ASCII characters (0 to 0x7f) encode as themselves, so files and strings that are encoded with ASCII values have the same encoding under both ASCII and UTF-8.
  • All UCS characters beyond (0x7f) are encoded as a multibyte sequence consisting only of bytes in the range of 0x80 to 0xfd. This means that no ASCII byte (including a null NULL byte) can appear as part of another character. This property supports the use of string handling functions.
  • It's easy to convert between UTF-8 and UCS-2 and UCS-4 fixed-width representations of characters.
  • The lexicographic sorting order of UCS-4 strings is preserved.
  • All possible 2^31 UCS codes can be encoded using UTF-8

...

Implementors of UTF-8 need to consider the security aspects of how they handle invalid UTF-8 sequences. It is conceivable that in some circumstances an attacker would be able to exploit an incautious UTF-8 parser by sending it an octet sequence that is not permitted by the UTF-8 syntax.

A particularly subtle form of this attack could be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain invalid octet sequences as characters. For example, a parser might prohibit the null NULL character when encoded as the single-octet sequence 00, but allow the invalid two-octet sequence C0 80 and interpret it as a null NULL character. Another example might be a parser which prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the invalid octet sequence 2F C0 AE 2E 2F.

...

Recommendation

Severity

Likelihood

Remediation Cost

Priority

Level

MSC10-A

2 ( medium )

1 ( unlikely ) 1 (

high )

P2

L3

Automated Detection

The LDRA tool suite V 7.6.0 is able to detect violations of this recommendation.

...