Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: remediation cost

...

Multibyte encodings are used for character sets that require more than one byte to uniquely identify each constituent character. For example, the Japanese encoding Shift-JIS (shown below) supports multibyte encoding where the maximum character length is two bytes (one leading and one trailing byte).

Byte Type

Range

single-byte

0x00 through 0x7F and 0xA0 through 0xDF

lead-byte

0x81 through 0x9F and 0xE0 through 0xFC

trailing-byte

0x40-0x7E and 0x80-0xFC

The trailing byte ranges overlap the range of both the single-byte and lead-byte characters. When a multibyte character is separated across a buffer boundary, it can be interpreted differently than if it were not separated across the buffer boundary; this difference arises because of the ambiguity of its composing bytes [Phillips 2005].

...

Forming strings consisting of partial characters can result in unexpected behavior.

Rule

Severity

Likelihood

Remediation Cost

Detectable

Repairable

Priority

Level

STR50-J

Low

low

Unlikely

unlikely

Yes

medium

No

P2

L3

Automated Detection

ToolVersionCheckerDescription
SonarQube
Include Page
SonarQube_V
SonarQube_V
S1943
 

 



Bibliography

[API 2014]

Classes Character and BreakIterator

 [Tutorials 2008]

Character Boundaries

...