Page History

...

The RFC describes the problem this way: Implementers of UTF-8 need to consider the security aspects of how they handle illegal UTF-8 sequences. It is conceivable that in some circumstances an attacker would be able to exploit an incautious UTF-8 parser by sending it an octet (byte) sequence that is not permitted by the UTF-8 syntax. A particularly subtle form of this attack could be carried out against a parser which which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as a character. For example, a parser might prohibit the NUL character when encoded as single-octet sequence 00, but allow the illegal two-octet sequence C0 80 (illegal because it's longer than necessary) and interpret it as a NUL character (00). Another example might be a parser which prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the illegal octet sequence 2F c) AE 2E 2F.

Wiki Markup
\[[Kuhn 06\|AA. C References#Kuhn 06]\] UTF-8 and Unicode FAQ for Unix/Linux

Viega 03 Section


\[[Viega 03|AA. C References#Viega 03]\] Section 3.12. "Detecting Illegal UTF-8 Characters"

Wheeler 06 Secure Programming for Linux and Unix HOWTO
Yergeau 98 RFC 2279 -


\[[Wheeler 06|AA. C References#Wheeler 06]\] Secure Programming for Linux and Unix HOWTO
\[[Yergeau 98|AA. C References#Yergeau 98]\] RFC 2279 - UTF-8, a transformation format of ISO 10646

Space shortcuts

Page tree

Versions Compared

Old Version 11

New Version 12

Key