...
Implementors of UTF-8 need to consider the security aspects of how they handle invalid UTF-8 sequences. It is conceivable that in some circumstances an attacker would be able to exploit an incautious UTF-8 parser by sending it an octet sequence that is not permitted by the UTF-8 syntax.
A particularly subtle form of this attack could be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain invalid octet sequences as characters. For example, a parser might prohibit the NULL null character when encoded as the single-octet sequence
00, but allow the invalid two-octet sequenceC0 80and interpret it as a NULL null character. Another example might be a parser which prohibits the octet sequence2F 2E 2E 2F("/../"), yet permits the invalid octet sequence2F C0 AE 2E 2F.
...
| Wiki Markup |
|---|
\[[ISO/IEC 10646:2003|AA. C References#ISO/IEC 10646-2003]\] Information technology - Universal Multiple-Octet Coded Character Set (UCS), First Edition. December, 2003. \[[Kuhn 06|AA. C References#Kuhn 06]\] UTF-8 and Unicode FAQ for UnixUNIX/Linux \[[Pike 93|AA. C References#Pike 93]\] \[[Viega 03|AA. C References#Viega 03]\] Section 3.12. "Detecting Illegal UTF-8 Characters" \[[Wheeler 06|AA. C References#Wheeler 06]\] Secure Programming for Linux and UnixUNIX HOWTO \[[Yergeau 98|AA. C References#Yergeau 98]\] RFC 2279 - UTF-8, a transformation format of ISO 10646 |
...