Increasingly, programmers view strings as a portable means of storing and communicating arbitrary data, such as numeric values. For example, a real-world system stores the binary values of encrypted passwords as strings in a database. Noncharacter data may not be representable as a string, because not all bit patterns represent valid characters in most character sets. Consequently, programmers must not convert noncharacter data to a string. 

Noncompliant Code Example

This noncompliant code example attempts to convert a BigInteger value to a String and then restore it to a BigInteger value. The toByteArray() method used returns a byte array containing the two's-complement representation of this BigInteger. The byte array is in big-endian byte order: the most significant byte is in the zeroth element. The program uses the String(byte[] bytes) constructor to create the string from the byte array. The behavior of this constructor when the given bytes are not valid in the default character set is unspecified, which is likely to be the case. Specifying the character set as a string also has unspecified behavior, although the Java API [API 2014] document claims that the String(byte[], Charset) method always replaces malformed-input and unmappable-character sequences with this character set's default replacement string. In any case, converting the String back to a BigInteger is unlikely to reproduce the original value. 

BigInteger x = new BigInteger("530500452766");
byte[] byteArray = x.toByteArray();
String s = new String(byteArray);
byteArray = s.getBytes();
x = new BigInteger(byteArray);

Compliant Solution

This compliant solution first produces a String representation of the BigInteger object and then converts the String object to a byte array. This process is then reversed. Because the textual representation in the String object is generated by the BigInteger class, it contains valid character data in the default character set.

BigInteger x = new BigInteger("530500452766");
String s = x.toString();  // Valid character data
byte[] byteArray = s.getBytes();
String ns = new String(byteArray);  
x = new BigInteger(ns); 

Compliant Solution (Base64)

Although Java does not provide a character set that guarantees lossless encoding of byte data, many other solutions exist for safely converting an arbitrary byte array into a string and back. Java 8 introduced the java.util.Base64 class, which provides encoders and decoders for the Base64 encoding scheme. This compliant solution uses Base64 to safely convert a number to a string and back without corrupting the data:

BigInteger x = new BigInteger("530500452766");
byte[] byteArray = x.toByteArray();
String s = Base64.getEncoder().encodeToString(byteArray);
byteArray = Base64.getDecoder().decode(s);
x = new BigInteger(byteArray);

Risk Assessment

Encoding noncharacter data as a string is likely to result in a loss of data integrity.




Remediation Cost









Related Guidelines


CWE-838, Inappropriate Encoding for Output Context




  1. A real world example involved storing the binary values of encrypted passwords as strings in a database.

    Citation needed.

     That said, it sounds like this rule is really forbidding creating a String from an 'unsanitized' byte array...that is a sequence of bytes that was not created from a legit String.

    Also, I'd love to see an 'implementation detail' section about what happens when the NCCE is run..what number does it produce?

    1. The value produced used to be there, if you check the history.  I thought it was uninteresting.  This seems like such a stupid mistake to me I have trouble imagine people making it. 

      The real world examples was from an A Bishop comment.  I sort of doubt it was published anywhere, besides as a comment on our wiki.

      1. I think that the text showing the string produced and the resulting number is of value and should be reinstated.

        We know that this is a stupid mistake but lots of people make stupid mistakes until the obvious is pointed out to them!

  2. The one I was talking about was from a 25year old system used by a government agency.  It was storing the encrypted password as a string, but the encoding was not specified.  There were lots of security issues related to the implementation and it was completely rewritten.  (encrypted password but algorithm and key theoretically known to ops/dev, no hash, no salt, storing encrypted bytes as string without specified encoding etc.).  The issue was only noticed when we changed to a different database on a different OS, as it was the DB default encoding that determined the string stored, so all of a sudden passwords no longer worked.