The contracts of the read methods for InputStream and Reader classes and their subclasses are complicated with regard to filling byte or character arrays. According to the Java API [API 2014] for the class InputStream, the read(byte[] b) method and the read(byte[] b, int off, int len) method provide the following behavior:

The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown.

According to the Java API for the read(byte[] b, int off, int len) method:

An attempt is made to read as many as len bytes, but a smaller number may be read, possibly zero. 

Both read methods return as soon as they find available input data. As a result, these methods can stop reading data before the array is filled because the available data may be insufficient to fill the array.

The documentation for the analogous read methods in Reader return the number of characters read, which implies that they also need not fill the char array provided as an argument.

Ignoring the result returned by the read() methods is a violation of EXP00-J. Do not ignore values returned by methods. Security issues can arise even when return values are considered because the default behavior of the read() methods lacks any guarantee that the entire buffer array is filled. Consequently, when using read() to fill an array, the program must check the return value of read() and must handle the case where the array is only partially filled. In such cases, the program may try to fill the rest of the array, or work only with the subset of the array that was filled, or throw an exception.

This rule applies only to read() methods that take an array argument. To read a single byte, use the InputStream.read() method that takes no arguments and returns an int. To read a single character, use a Reader.read() method that takes no arguments and returns the character read as an int.

Noncompliant Code Example (1-argument read())

This noncompliant code example attempts to read 1024 bytes encoded in UTF-8 from an InputStream and return them as a String. It explicitly specifies the character encoding used to build the string, in compliance with STR04-J. Use compatible character encodings when communicating string data between JVMs.

public static String readBytes(InputStream in) throws IOException {
  byte[] data = new byte[1024];
  if (in.read(data) == -1) {
    throw new EOFException();
  }
  return new String(data, "UTF-8");
}

The programmer's misunderstanding of the general contract of the read() method can result in failure to read the intended data in full. It is possible that less than 1024 bytes exist in the stream, perhaps because the stream originates from a file with less than 1024 bytes. It is also possible that the stream contains 1024 bytes but less than 1024 bytes are immediately available, perhaps because the stream originates from a TCP socket that sent more bytes in a subsequent packet that has not arrived yet. In either case, read() will return less than 1024 bytes. It indicates this through its return value, but the program ignores the return value and uses the entire array to construct a string, even though any unread bytes will fill the string with null characters.

Noncompliant Code Example (3-argument read())

This noncompliant code example uses the 3-argument version of read() to read 1024 bytes encoded in UTF-8 from an InputStream and return them as a String.

public static String readBytes(InputStream in) throws IOException {
  byte[] data = new byte[1024];
  int offset = 0;
  if (in.read(data, offset, data.length - offset)) != -1) {
    throw new EOFException();
  }
  return new String(data, "UTF-8");
}

However, this code suffers from the same flaws as the previous noncompliant code example. Again, the read() method can return less than 1024 bytes, either because 1024 bytes are simply not available, or the latter bytes have not arrived in the stream yet.  In either case, read() returns less than 1024 bytes, the remaining bytes in the array remain with zero values, yet the entire array is used to construct the string.

Compliant Solution (Multiple Calls to read())

This compliant solution reads all the desired bytes into its buffer, accounting for the total number of bytes read and adjusting the remaining bytes' offset, consequently ensuring that the required data is read in full. It also avoids splitting multibyte encoded characters across buffers by deferring construction of the result string until the data has been fully read. (see IDS10-J. Do not assume every character in a string is the same size for more information).

public static String readBytes(InputStream in) throws IOException {
  int offset = 0;
  int bytesRead = 0;
  byte[] data = new byte[1024];
  while ((bytesRead = in.read(data, offset, data.length - offset))
    != -1) {
    offset += bytesRead;
    if (offset >= data.length) {
      break;
    }
  }
  String str = new String(data, 0, offset, "UTF-8");
  return str;
}

Compliant Solution (readFully())

The no-argument and one-argument readFully() methods of the DataInputStream class guarantee that either all of the requested data is read or an exception is thrown. These methods throw EOFException if they detect the end of input before the required number of bytes have been read; they throw IOException if some other I/O error occurs.

public static String readBytes(FileInputStream fis)
                               throws IOException {
  byte[] data = new byte[1024];
  DataInputStream dis = new DataInputStream(fis);
  dis.readFully(data);
  String str = new String(data, "UTF-8");
  return str;
}

Risk Assessment

Incorrect use of the read() method can result in the wrong number of bytes being read or character sequences being interpreted incorrectly.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

FIO10-J

Low

Unlikely

Medium

P2

L3

Automated Detection

ToolVersionCheckerDescription
SonarQube9.9S2674 

 

Related Guidelines

MITRE CWE

CWE-135, Incorrect Calculation of Multi-byte String Length

Bibliography

[API 2006]

Class InputStream
Class DataInputStream

[Chess 2007]

Section 8.1, "Handling Errors with Return Codes"

[Harold 1999]

Chapter 7, "Data Streams, Reading Byte Arrays"

[Phillips 2005]

 

 


7 Comments

  1. There's a broken IDS17J link.

  2. "the InputStream and Reader families" is a little difficult to understand.
    can we rephrase this as follows?

    the InputStream and Reader and their subclasses

  3. Automated Detection:

    Sonar:

    findbugs:RR_NOT_CHECKED

    and

    findbhugs:SR_NOT_CHECKED

  4. I don't quite understand what is the purpose of the 1st compliant solution (Multiple Calls to read()).

    If it reads a file <1Kb then it converts it to String using correctly the bytes read.

    If it is passed a file >1Kb then it only reads the first Kb and then breaks (as offset >= data.length). Is this the purpose? Not to read the whole file?

    It would be more useful to have an example that reads the file in chunks (of 1Kb).


    1. The compliant solution was designed to read 1024 bytes and stop, which was the same design for the noncompliant code examples.