String objects in Java are encoded in UTF-16. Java Platform is required to support other character encodings or charsets such as US-ASCII, ISO-8859-1, and UTF-8. Errors may occur when converting between differently coded character data. There are two general types of encoding errors. If the byte sequence is not valid for the specified charset then the input is considered malformed. If the byte sequence cannot be mapped to an equivalent character sequence then an unmappable character has been encountered.
According to the Java API [API 2014] for the
The behavior of this constructor when the given bytes are not valid in the given charset is unspecified.
Similarly, the description of the
String.getBytes(Charset) method states:
This method always replaces malformed-input and unmappable-character sequences with this charset's default replacement byte array.
CharsetEncoder class is used to transform character data into a sequence of bytes in a specific charset. The input character sequence is provided in a character buffer or a series of such buffers. The output byte sequence is written to a byte buffer or a series of such buffers. The
CharsetDecoder class reverses this process by transforming a sequence of bytes in a specific charset into character data. The input byte sequence is provided in a byte buffer or a series of such buffers, while the output character sequence is written to a character buffer or a series of such buffers.
Special care should be taken when decoding untrusted byte data to ensure that malformed input or unmappable character errors do not result in defects and vulnerabilities. Encoding errors can also occur, for example, encoding a cryptographic key containing malformed input for transmission will result in an error. Encoding and decoding errors typically result in data corruption.
Noncompliant Code Example
This noncompliant code example is similar to the one used in STR03-J. Do not represent numeric data as strings in that it attempts to convert a byte array containing the two's-complement representation of this
BigInteger value to a
String. Because the byte array contains malformed-input sequences, the behavior of the
String constructor is unspecified.
java.nio.charset.CharacterDecoder provide greater control over the process. In this compliant solution, the
CharsetDecode.decode() method is used to convert the byte array containing the two's-complement representation of this
BigInteger value to a
CharBuffer. Because the bytes do not represent a valid UTF-16, the input is considered malformed, and a
MalformedInputException is thrown.
Malformed input or unmappable character errors can result in a loss of data integrity.
CWE-838. Inappropriate Encoding for Output Context
CWE-116. Improper Encoding or Escaping of Output