Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The trailing byte ranges overlap the range of both the single-byte and lead-byte characters. When a multibyte character is separated across a buffer boundary, it can be interpreted differently than if it were not separated across the buffer boundary; this difference arises because of the ambiguity of its composing bytes [Phillips 2005].

Supplementary Characters

The char data type is based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value.Characters whose code points are greater than U+FFFF are called supplementary characters. Such characters are generally rare, but some are used, for example, as part of Chinese and Japanese personal names. To support supplementary characters without changing the char primitive data type and causing incompatibility with previous Java programs, supplementary characters are defined by a pair of code point values that are called surrogates. According to the Java API [API 2014] class Character documentation (Unicode Character Representations):

...

This noncompliant code example attempts to trim leading letters from string. However, this method may fail because methods that only accept a char value cannot support supplementary characters. According to the Java API [API 2014] class Character documentation:

...

This compliant solution works both for supplementary and for combining characters [Tutorials 2008]. According to the Java API [API 2006] class java.text.BreakIterator documentation:

...

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

STR50-J

low

unlikely

medium

P2

L3

Bibliography

[API 2014]

Classes Character and BreakIterator

 [Tutorials 2008]

Character Boundaries

 

Rec. 04: Characters and Strings (STR)      Rec. 04: Characters and Strings (STR)Image Added      Image Added