Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Process A performs security checks, but does not check for non-shortest UTF-8 forms.
  2. Process B accepts the byte sequence from process A and transforms it into UTF-16 while interpreting possible non-shortest forms.
  3. The UTF-16 text may contain characters that should have been filtered out by process A and can potentially be dangerous. These non-"shortest" UTF-8 attacks have been used to bypass security validations in high-profile products, such as Microsoft's IIS web server.

Wiki Markup
[Corrigendum #1: UTF-8 Shortest Form|http://www.unicode.org/versions/corrigendum1.html] to the Unicode Standard \[[Unicode 06|AA. C References#Unicode 06]\] describes modifications to Version 3.0 of The Unicode Standard necessary to define what is meant by the shortest form.  

Handling Invalid Inputs

UTF-8 decoders have no uniformly defined behavior upon encountering an invalid input. Below are several ways a UTF-8 decoder might behave in the event of an invalid byte sequence:

...

Wiki Markup
\[[ISO/IEC 10646:2003|AA. C References#ISO/IEC 10646-2003]\]
\[[ISO/IEC PDTR 24772|AA. C References#ISO/IEC PDTR 24772]\] "AJN Choice of Filenames and other External Identifiers"
\[[Kuhn 06|AA. C References#Kuhn 06]\]
\[[MITRE 07|AA. C References#MITRE 07]\] [CWE ID 176|http://cwe.mitre.org/data/definitions/176.html], "Failure to Handle Unicode Encoding," [CWE ID 116|http://cwe.mitre.org/data/definitions/116.html], "Improper Encoding or Escaping of Output" 
\[[Pike 93|AA. C References#Pike 93]\]
\[[Unicode 06|AA. C References#Unicode 06]\]  
\[[Viega 03|AA. C References#Viega 03]\] Section 3.12, "Detecting Illegal UTF-8 Characters"
\[[Wheeler 03|AA. C References#Wheeler 03]\]
\[[Yergeau 98|AA. C References#Yergeau 98]\]

...