Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: REM cost reform

The contracts of the read methods for the InputStream family are complicated InputStream and Reader classes and their subclasses are complicated with regard to filling byte or character arrays. According to the Java API [API 20062014] for the class InputStream, the read(byte[] b) method and the read(byte[] b, int off, int len) method provides provide the following behavior:

The default implementation of this number of bytes actually read is returned as an integer. This method blocks until the requested amount of input data len has been readis available, end of file is detected, or an exception is thrown. Subclasses are encouraged to provide a more efficient implementation of this method.

According to the Java API for the However, the read(byte[] b, int off, int len) method states that it:

Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. The number of bytes read is, at most, equal to the length of b.

:

An attempt is made to read as many as len bytes, but a smaller number may be read, possibly zero. 

Both read Note that the read() methods return as soon as they find available input data. As a result, these methods can stop reading data before the array is filled because the available data may be insufficient to fill the array.

The documentation for the analogous read methods in Reader return the number of characters read, which implies that they also need not fill the char array provided as an argument.

Ignoring the result returned by the read() methods is a violation of guideline EXP00-J. Do not ignore values returned by methods. Security issues can arise even when return values are considered , because the default behavior of the read() methods lacks any guarantee that the entire buffer array will be filled. The programmer is filled. Consequently, when using read() to fill an array, the program must check the number return value of bytes actually read and call the read() method again as required.

Another source of data read errors is failure to correctly handle multibyte encoded data. Multibyte encodings such as UTF-8 are used for character sets that require more than one byte to uniquely identify each constituent character. For example, the Japanese encoding Shift-JIS (shown below), supports multibyte encoding wherein the maximum character length is two bytes (one leading and one trailing byte).

Byte Type

Range

single-byte

0x00 through 0x7F and 0xA0 through 0xDF

lead-byte

0x81 through 0x9F and 0xE0 through 0xFC

trailing-byte

0x40-0x7E and 0x80-0xFC

The trailing byte ranges overlap the range of both the single byte and lead byte characters. When a multibyte character is separated across a buffer boundary, it can be interpreted differently than it if were not separated across the buffer boundary; this difference arises due to the ambiguity of its composing bytes [Phillips 2005].

A third data reading issue arises from the behavior of the String class constructor with respect to the default encoding. See guideline IDS17-J. Use compatible encodings on both sides of file or network IO for more details.

() and must handle the case where the array is only partially filled. In such cases, the program may try to fill the rest of the array, or work only with the subset of the array that was filled, or throw an exception.

This rule applies only to read() methods that take an array argument. To read a single byte, use the InputStream.read() method that takes no arguments and returns an int. To read a single character, use a Reader.read() method that takes no arguments and returns the character read as an int.

Noncompliant Code Example (1-argument read())

...

This noncompliant code example attempts to read 1024 bytes from a FileInputStream and to encoded in UTF-8 from an InputStream and return them as a String. It explicitly specifies the character encoding used to build the string, in compliance with STR04-J. Use compatible character encodings when communicating string data between JVMs.

Code Block
bgColor#FFcccc

public static String readBytes(FileInputStreamInputStream in) throws IOException {
  String str = "";
  byte[] data = new byte[1024];
  whileif (in.read(data) >== -1) {
    strthrow += new StringEOFException(data);
  }
  return str new String(data, "UTF-8");
}

This noncompliant code example can fail in several different ways. First, the The programmer's misunderstanding of the general contract of the read() methods method can result in failure to read the intended data in full. Second, the code fails to consider the interaction between characters represented with a multi-byte encoding and the boundaries between the loop iterations. When the last byte read from the data stream is the leading byte of a multibyte character, the trailing bytes will not be encountered until the next iteration of the while loop. However, multi-byte encoding is resolved during construction of the new String within the loop. Consequently, the multibyte encoding will be interpreted incorrectly in this case. Finally, because no specific character encoding is specified in the call to the String class constructor, the constructor uses the system default character encoding to interpret the bytes in the buffer. If the input used a character encoding that differs from the system's default character encoding, the resulting string can be corrupted.

...

It is possible that less than 1024 bytes exist in the stream, perhaps because the stream originates from a file with less than 1024 bytes. It is also possible that the stream contains 1024 bytes but less than 1024 bytes are immediately available, perhaps because the stream originates from a TCP socket that sent more bytes in a subsequent packet that has not arrived yet. In either case, read() will return less than 1024 bytes. It indicates this through its return value, but the program ignores the return value and uses the entire array to construct a string, even though any unread bytes will fill the string with null characters.

Noncompliant Code Example (3-argument read())

This noncompliant code example uses the 3-argument version of read() to read 1024 bytes encoded in UTF-8 from an InputStream and return them as a String.

Code Block
bgColor#FFcccc
public static String readBytes(InputStream in) throws IOException {
  byte[] data = new byte[1024];
  int offset = 0;
  if (in.read(data, offset, data.length - offset)) != -1) {
    throw new EOFException();
  }
  return new String(data, "UTF-8");
}

However, this code suffers from the same flaws as the previous noncompliant code example. Again, the read() method can return less than 1024 bytes, either because 1024 bytes are simply not available, or the latter bytes have not arrived in the stream yet.  In either case, read() returns less than 1024 bytes, the remaining bytes in the array remain with zero values, yet the entire array is used to construct the string.

Compliant Solution (Multiple Calls to read())

This compliant solution reads all the desired bytes into its buffer, accounting for the total number of bytes read and adjusting the remaining bytes' offset, thus consequently ensuring that the required data are is read in full. It also avoids splitting multibyte encoded characters across buffers by deferring construction of the result string until the data have has been read in full. It also facilitates portability across systems that use different default character encodings by specifying an explicit character encoding for the String constructorfully read. (see IDS10-J. Do not assume every character in a string is the same size for more information).

Code Block
bgColor#ccccff

public static String readBytes(FileInputStreamInputStream in) throws IOException {
  int offset = 0;
  int bytesRead = 0;
  byte[] data = new byte[1024];
  while (true) { 
    (bytesRead += in.read(data, offset, data.length - offset);))
    != -1) {
    ifoffset += (bytesRead;
 ==  -1 ||if (offset >= data.length) {
      break;
    offset += bytesRead;}
  }
  String str = new String(data, 0, offset, "UTF-8");
  return str;
}

The size of the data byte buffer depends on the maximum number of bytes required to write an encoded character. For example, UTF-8 encoded data requires four bytes to represend any character above U+FFFF. Because Java uses the UTF-16 character encoding to represent char data, such sequences are split into two separate char values of two bytes each. Consequently, the buffer size should be four times the size of a typical byte sequence.

Compliant Solution (readFully())

The no-argument and one-argument readFully() methods of the DataInputStream class guarantee that they either will read all of the requested data is read or will throw an exception is thrown. These methods throw EOFException if they detect the end of input before the required number of bytes have been read; they throw IOException if some other inputI/output O error occurs. This compliant solution also specifies an explicit character encoding to the String constructor.

Code Block
bgColor#ccccff

public static String readBytes(FileInputStream fis)
                               throws IOException {
  byte[] data = new byte[1024];
  DataInputStream dis = new DataInputStream(fis);
  dis.readFully(data);
  String str = new String(data, "UTF-8");
  return str;
}

Risk Assessment

Failure to comply with this guideline Incorrect use of the read() method can result in the wrong number of bytes being read or character sequences being interpreted incorrectly.

Guideline

Rule

Severity

Likelihood

Detectable

Remediation Cost

Repairable

Priority

Level

FIO02

FIO10-J

Low

low

Unlikely

unlikely

No

medium

No

P2

P1

L3

Automated Detection

...

TODO

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this guideline on the CERT website.

Bibliography

Wiki Markup
[[API 2006|AA. Bibliography#API 06]\] Class {{InputStream}}, {{DataInputStream}}
[[Chess 2007|AA. Bibliography#Chess 07]\] 8.1 Handling Errors with Return Codes
[[Harold 1999|AA. Bibliography#Harold 99]\] Chapter 7: Data Streams, Reading Byte Arrays
\[[MITRE 2009|AA. Bibliography#MITRE 09]\] [CWE ID 135|http://cwe.mitre.org/data/definitions/135.html] "Incorrect Calculation of Multi-Byte String Length"
[[Phillips 2005|AA. Bibliography#Phillips 05]\] 

ToolVersionCheckerDescription
Parasoft Jtest
Include Page
Parasoft_V
Parasoft_V
CERT.FIO10.NASSIGIOEnsure the return values of specified file I/O methods are used
SonarQube
Include Page
SonarQube_V
SonarQube_V
S2674


Related Guidelines

MITRE CWE

CWE-135, Incorrect Calculation of Multi-byte String Length

Bibliography

[API 2006]

Class InputStream
Class DataInputStream

[Chess 2007]

Section 8.1, "Handling Errors with Return Codes"

[Harold 1999]

Chapter 7, "Data Streams, Reading Byte Arrays"

[Phillips 2005]



...

Image Added Image Added Image AddedFIO01-J. Do not expose buffers created using the wrap() or duplicate() methods to untrusted code      12. Input Output (FIO)      IDS17-J. Use compatible encodings on both sides of file or network IO