Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: UTF-8 considers only 21bits, and the latest ISO/IEC 10646 has only 21bits, not 31bits.

...

  • The classical US-ASCII characters (0 to 0x7f) encode as themselves, so files and strings that are encoded with ASCII values have the same encoding under both ASCII and UTF-8.
  • It is easy to convert between UTF-8 and UCS-2 and UCS-4 fixed-width representations of characters.
  • The lexicographic sorting order of UCS-4 strings is preserved.
  • All possible 2^31 2^21 UCS codes can be encoded using UTF-8.

...

Although UTF-8 originated from the Plan 9 developers [Pike 1993], Plan 9's own support covers only the low 16-bit range. In general, many "Unicode" systems support only the low 16-bit range, not the full 3121-bit ISO 10646 code space [ISO/IEC 10646:2003(E)].

...