...
Multibyte encodings are used for character sets that require more than one byte to uniquely identify each constituent character. For example, the Japanese encoding Shift-JIS (shown below) supports multibyte encoding where the maximum character length is two bytes (one leading and one trailing byte).
Byte Type | Range |
|---|---|
single-byte |
|
lead-byte |
|
trailing-byte |
|
The trailing byte ranges overlap the range of both the single-byte and lead-byte characters. When a multibyte character is separated across a buffer boundary, it can be interpreted differently than if it were not separated across the buffer boundary; this difference arises because of the ambiguity of its composing bytes [Phillips 2005].
...
Forming strings consisting of partial characters can result in unexpected behavior.
Rule | Severity | Likelihood |
|---|
Detectable | Repairable | Priority | Level |
|---|---|---|---|
STR50-J | Low |
Unlikely |
Yes |
No | P2 | L3 |
Automated Detection
Bibliography
[API 2014] | Classes |
Character Boundaries |
...