Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Corrigendum #1: UTF-8 Shortest Form to the Unicode Standard [Unicode 2006] describes modifications made to version 3.0 of the Unicode Standard to forbid the interpretation of the non-shortest nonshortest forms.

Handling Invalid Inputs

...

  1. Substitute for the replacement character "U+FFFD" or the wildcard character such as "?" when U+FFFD is not available.
  2. Ignore the bytes (ex. for example, delete the invalid byte before the validation process. ; see "Unicode Technical Report #36, 3.5 Deletion of Code Points" for more information).
  3. Interpret the bytes according to a different character encoding (often the ISO-8859-1 character map. ; other encoding, such as Shift_JIS, is known to trigger self-XSS thus , and so is potentially dangerous).
  4. Fail to notice but decode as if the bytes were some similar bit of UTF-8.
  5. Stop decoding and report an error.

...

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Related Guidelines

...

...

...

...

Failure to handle Unicode encoding

...

...

Improper encoding or escaping of output

...

...

Bibliography

...

...

...

Illegal UTF-8

...

 

...