...
- Process A performs security checks, but does not check for non-shortest UTF-8 forms.
- Process B accepts the byte sequence from process A and transforms it into UTF-16 while interpreting possible non-shortest forms.
- The UTF-16 text may contain characters that should have been filtered out by process A and can potentially be dangerous. These non-"shortest" UTF-8 attacks have been used to bypass security validations in high-profile products, such as Microsoft's IIS web server.
| Wiki Markup |
|---|
[Corrigendum #1: UTF-8 Shortest Form|http://www.unicode.org/versions/corrigendum1.html] to the Unicode Standard \[[Unicode 06|AA. C References#Unicode 06]\] describes modifications to Version 3.0 of The Unicode Standard necessary to define what is meant by the shortest form. |
Handling Invalid Inputs
UTF-8 decoders have no uniformly defined behavior upon encountering an invalid input. Below are several ways a UTF-8 decoder might behave in the event of an invalid byte sequence:
...
| Wiki Markup |
|---|
\[[ISO/IEC 10646:2003|AA. C References#ISO/IEC 10646-2003]\] \[[ISO/IEC PDTR 24772|AA. C References#ISO/IEC PDTR 24772]\] "AJN Choice of Filenames and other External Identifiers" \[[Kuhn 06|AA. C References#Kuhn 06]\] \[[MITRE 07|AA. C References#MITRE 07]\] [CWE ID 176|http://cwe.mitre.org/data/definitions/176.html], "Failure to Handle Unicode Encoding," [CWE ID 116|http://cwe.mitre.org/data/definitions/116.html], "Improper Encoding or Escaping of Output" \[[Pike 93|AA. C References#Pike 93]\] \[[Unicode 06|AA. C References#Unicode 06]\] \[[Viega 03|AA. C References#Viega 03]\] Section 3.12, "Detecting Illegal UTF-8 Characters" \[[Wheeler 03|AA. C References#Wheeler 03]\] \[[Yergeau 98|AA. C References#Yergeau 98]\] |
...