Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: remediation cost

...

Multibyte encodings are used for character sets that require more than one byte to uniquely identify each constituent character. For example, the Japanese encoding Shift-JIS (shown below) supports multibyte encoding where the maximum character length is two bytes (one leading and one trailing byte).

Byte Type

Range

single-byte

0x00 through 0x7F and 0xA0 through 0xDF

lead-byte

0x81 through 0x9F and 0xE0 through 0xFC

trailing-byte

0x40-0x7E and 0x80-0xFC

The trailing byte ranges overlap the range of both the single-byte and lead-byte characters. When a multibyte character is separated across a buffer boundary, it can be interpreted differently than if it were not separated across the buffer boundary; this difference arises because of the ambiguity of its composing bytes [Phillips 2005].

Supplementary Characters

The char data type is based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value.Characters whose code points are greater than U+FFFF are called supplementary characters. Such characters are generally rare, but some are used, for example, as part of Chinese and Japanese personal names. To support supplementary characters without changing the char primitive data type and causing incompatibility with previous Java programs, supplementary characters are defined by a pair of code point values that are called surrogates. According to the Java API [API 2014] class Character documentation (Unicode Character Representations):

...

This noncompliant code example attempts to trim leading letters from string. However, this method may fail because methods that only accept a char value cannot support supplementary characters. According to the Java API [API 2014] class Character documentation:

...

This compliant solution works both for supplementary and for combining characters [Tutorials 2008]. According to the Java API [API 2006] class java.text.BreakIterator documentation:

...

Forming strings consisting of partial characters can result in unexpected behavior.

Rule

Severity

Likelihood

Remediation Cost

Detectable

Repairable

Priority

Level

STR50-J

Low

low

Unlikely

unlikely

Yes

medium

No

P2

L3

Automated Detection

ToolVersionCheckerDescription
SonarQube
Include Page
SonarQube_V
SonarQube_V
S1943


Bibliography

[API 2014]

Classes Character and BreakIterator

 [Tutorials 2008]

Character Boundaries

 


Rec. 04: Characters and Strings (STR)      Rec. 04: Characters and Strings (STR)Image Added      Image Added