...
| Wiki Markup |
|---|
Although UTF-8 originated from the Plan 9 developers \[[Pike 93|AA. References#PikeBibliography#Pike 93]\], Plan 9's own support only covers the low 16-bit range. In general, many "Unicode" systems only support the low 16-bit range, not the full 31-bit ISO 10646 code space \[[ISO/IEC 10646:2003(E)|AA. References#ISOBibliography#ISO/IEC 10646-2003]\]. |
Security-Related Issues
| Wiki Markup |
|---|
According to \[[Yergeau 98|AA. References#YergeauBibliography#Yergeau 98]\]: |
Implementors of UTF-8 need to consider the security aspects of how they handle invalid UTF-8 sequences. It is conceivable that in some circumstances an attacker would be able to exploit an incautious UTF-8 parser by sending it an octet sequence that is not permitted by the UTF-8 syntax.
A particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain invalid octet sequences as characters. For example, a parser might prohibit the null character when encoded as the single-octet sequence
00, but allow the invalid two-octet sequenceC0 80and interpret it as a null character. Another example might be a parser which prohibits the octet sequence2F 2E 2E 2F("/../"), yet permits the invalid octet sequence2F C0 AE 2E 2F.
...
| Wiki Markup |
|---|
[Corrigendum #1: UTF-8 Shortest Form|http://www.unicode.org/versions/corrigendum1.html] to the Unicode Standard \[[Unicode 06|AA. References#UnicodeBibliography#Unicode 06]\] describes modifications to Version 3.0 of The Unicode Standard necessary to define what is meant by the shortest form. |
...
| Wiki Markup |
|---|
The following function from \[[Viega 03|AA. References#ViegaBibliography#Viega 03]\] detects invalid character sequences in a string but does not reject non-minimal forms. It returns {{1}} if the string is composed only of legitimate sequences; otherwise it returns {{0}}. |
...
| Wiki Markup |
|---|
\[[ISO/IEC 10646:2003|AA. References#ISOBibliography#ISO/IEC 10646-2003]\] \[[ISO/IEC PDTR 24772|AA. References#ISOBibliography#ISO/IEC PDTR 24772]\] "AJN Choice of Filenames and other External Identifiers" \[[Kuhn 06|AA. References#KuhnBibliography#Kuhn 06]\] \[[MITRE 07|AA. References#MITREBibliography#MITRE 07]\] [CWE ID 176|http://cwe.mitre.org/data/definitions/176.html], "Failure to Handle Unicode Encoding," [CWE ID 116|http://cwe.mitre.org/data/definitions/116.html], "Improper Encoding or Escaping of Output" \[[Pike 93|AA. References#PikeBibliography#Pike 93]\] \[[Unicode 06|AA. References#UnicodeBibliography#Unicode 06]\] \[[Viega 03|AA. References#ViegaBibliography#Viega 03]\] Section 3.12, "Detecting Illegal UTF-8 Characters" \[[Wheeler 03|AA. References#WheelerBibliography#Wheeler 03]\] \[[Yergeau 98|AA. References#YergeauBibliography#Yergeau 98]\] |
...
MSC09-C. Character Encoding - Use Subset of ASCII for Safety 49. Miscellaneous (MSC)