Page History

...

Implementors of UTF-8 need to consider the security aspects of how they handle invalid UTF-8 sequences. It is conceivable that in some circumstances an attacker would be able to exploit an incautious UTF-8 parser by sending it an octet sequence that is not permitted by the UTF-8 syntax.
A particularly subtle form of this attack could be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain invalid octet sequences as characters. For example, a parser might prohibit the NULL null character when encoded as the single-octet sequence 00, but allow the invalid two-octet sequence C0 80 and interpret it as a NULL null character. Another example might be a parser which prohibits the octet sequence 2F 2E 2E 2F ("/../"), yet permits the invalid octet sequence 2F C0 AE 2E 2F.

...

Wiki Markup

\[[ISO/IEC 10646:2003|AA. C References#ISO/IEC 10646-2003]\] Information technology - Universal Multiple-Octet Coded Character Set (UCS), First Edition. December, 2003.
\[[Kuhn 06|AA. C References#Kuhn 06]\] UTF-8 and Unicode FAQ for UnixUNIX/Linux
\[[Pike 93|AA. C References#Pike 93]\]
\[[Viega 03|AA. C References#Viega 03]\] Section 3.12. "Detecting Illegal UTF-8 Characters"
\[[Wheeler 06|AA. C References#Wheeler 06]\] Secure Programming for Linux and UnixUNIX HOWTO
\[[Yergeau 98|AA. C References#Yergeau 98]\] RFC 2279 - UTF-8, a transformation format of ISO 10646

...

Space shortcuts

Page tree

Versions Compared

Old Version 33

New Version 34

Key