Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

UCS Code (HEX)

Binary UTF-8 Format

Legal UTF-8 Values (HEX)

00-7F

0xxxxxxx
 

00-7F

80-7FF

110xxxxx 10xxxxxx  

C2-DF 80-BF

800-FFF

1110xxxx 10xxxxxx 10xxxxxx  

E0 A0*-BF 80-BF

1000-FFFF

1110xxxx 10xxxxxx 10xxxxxx  

E1-EF 80-BF 80-BF

10000-3FFFF

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx  

F0 90*-BF 80-BF 80-BF

40000-FFFFFF

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx  

F1-F3 80-BF 80-BF 80-BF

40000-FFFFFF

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx  

F1-F3 80-BF 80-BF 80-BF

100000-10FFFFF

11110xxx 10xxxxxx 10xxxxxx 10xxxxxx  

F4 80-8F* 80-BF 80-BF


Security Related Issues

The UTF-8 encoding scheme is fairly simple, but there are a few clarifications that are important for security reasons. One of the most important ones is the requirement that only the "shortest" form of UTF-8 should be permitted. Naive decoder may accept encoding that are longer than necessary, this means that potentially dangerous input could be represented multiple ways, and this will defeat the security checking for dangerous inputs.

...