Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The classical US-ASCII characters (0 to 0x7f) encode as themselves, so files and strings that are encoded with ASCII values have the same encoding under both ASCII and UTF-8.
  • All UCS characters beyond (0x7f) are encoded as a multibyte sequence consisting only of bytes in the range of 0x80 to 0xfd. This means that no ASCII byte (including a NULL null byte) can appear as part of another character. This property supports the use of string handling functions.
  • It's easy to convert between UTF-8 and UCS-2 and UCS-4 fixed-width representations of characters.
  • The lexicographic sorting order of UCS-4 strings is preserved.
  • All possible 2^31 UCS codes can be encoded using UTF-8.

...