Page History

...

Corrigendum #1: UTF-8 Shortest Form to the Unicode Standard [Unicode 2006] describes modifications made to version 3.0 of the Unicode Standard to forbid the interpretation of the non-shortest nonshortest forms.

Handling Invalid Inputs

...

Substitute for the replacement character "U+FFFD" or the wildcard character such as "?" when U+FFFD is not available.
Ignore the bytes (ex. for example, delete the invalid byte before the validation process. ; see "Unicode Technical Report #36, 3.5 Deletion of Code Points" for more information).
Interpret the bytes according to a different character encoding (often the ISO-8859-1 character map. ; other encoding, such as Shift_JIS, is known to trigger self-XSS thus , and so is potentially dangerous).
Fail to notice but decode as if the bytes were some similar bit of UTF-8.
Stop decoding and report an error.