The contracts of the read methods for InputStream and Reader classes and their subclasses are complicated with regard to filling byte or character arrays. According to the Java API [API 062014] Class for the class InputStream, read method documentation:
Wiki Markup \[read()\] Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. The number of bytes read is, at most, equal to the length of b.
the read(byte[] b) method and the read(byte[] b, int off, int len) method provide the following behavior:
The number of bytes actually read is returned as an integer. This method blocks until input data is available, end of file is detected, or an exception is thrown.
According to the Java API for the read(byte[] b, int off, int len) method:
An attempt is made to read as many as
lenbytes, but a smaller number may be read, possibly zero.
Both read methods Note that the read methods will return as soon as they find some available input data. By default, none of them guarantee that all the requested bytes will be read. It is left to the programmer to check the number of bytes read and call the read method again as required. This behavior is also a direct violation of As a result, these methods can stop reading data before the array is filled because the available data may be insufficient to fill the array.
The documentation for the analogous read methods in Reader return the number of characters read, which implies that they also need not fill the char array provided as an argument.
Ignoring the result returned by the read() methods is a violation of EXP00EXP02-J. Do not ignore values returned by method.
Multibyte encodings like UTF-8 are used for character sets that require more than one byte to uniquely identify each constituting character. For example, the Japanese encoding Shift-JIS (shown below), supports multibyte encoding where the maximum character length is 2 bytes (one leading and one trailing byte).
Byte Type | Range |
|---|---|
single-byte | 0x00 through 0x7F and 0xA0 through 0xDF |
lead-byte | 0x81 through 0x9F and 0xE0 through 0xFC |
trailing-byte | 0x40-0x7E and 0x80-0xFC |
The trailing byte ranges overlap the range of both the single byte and lead byte characters. This can cause issues because if a multibyte character is separated between buffer boundaries, it will be interpreted differently, as defined by its composing bytes. [Phillips 05]
A third issue is caused due to the behavior of the String class constructor. According to [API 06] String class documentation:
The length of the new String is a function of the charset, and hence may not be equal to the length of the byte array. The behavior of this constructor when the given bytes are not valid in the given charset is unspecified.
Noncompliant Code Example
This noncompliant snippet intends to read a specific number of bytes from an InputStream but suffers from a few pitfalls. The objective is to read 1024 bytes and return them as a String. Unfortunately, this won't happen because of the general contract of the read methods.
The other issue involves multibyte character encoding. It is possible for the read method to read data from the stream terminating the String buffer str with the leading byte and in the next iteration reading the trailing bytes. Since the bytes are concatenated to str, the multibyte encoding information is lost as it does not extend across buffer boundaries.
methods. Security issues can arise even when return values are considered because the default behavior of the read() methods lacks any guarantee that the entire buffer array is filled. Consequently, when using read() to fill an array, the program must check the return value of read() and must handle the case where the array is only partially filled. In such cases, the program may try to fill the rest of the array, or work only with the subset of the array that was filled, or throw an exception.
This rule applies only to read() methods that take an array argument. To read a single byte, use the InputStream.read() method that takes no arguments and returns an int. To read a single character, use a Reader.read() method that takes no arguments and returns the character read as an int.
Noncompliant Code Example (1-argument read())
This noncompliant code example attempts to read 1024 bytes encoded in UTF-8 from an InputStream and return them as a String. It explicitly specifies the character encoding used to build the string, in compliance with STR04-J. Use compatible character encodings when communicating string data between JVMs.
| Code Block | ||
|---|---|---|
| ||
public static String readBytes(InputStream in) throws IOException {
byte[] data = new byte[1024];
if (in.read(data) == -1) {
throw new EOFException();
}
return new String(data, "UTF-8");
}
|
The programmer's misunderstanding of the general contract of the read() method can result in failure to read the intended data in full. It is possible that less than 1024 bytes exist in the stream, perhaps because the stream originates from a file with less than 1024 bytes. It is also possible that the stream contains 1024 bytes but less than 1024 bytes are immediately available, perhaps because the stream originates from a TCP socket that sent more bytes in a subsequent packet that has not arrived yet. In either case, read() will return less than 1024 bytes. It indicates this through its return value, but the program ignores the return value and uses the entire array to construct a string, even though any unread bytes will fill the string with null characters.
Noncompliant Code Example (3-argument read())
This noncompliant code example uses the 3-argument version of read() to read 1024 bytes encoded in UTF-8 from an InputStream and return them as a StringFinally, str will contain data represented by the default encoding of the system as no specific encoding has been specified in the call to the String class constructor.
| Code Block | ||
|---|---|---|
| ||
public static String readBytes(InputStream in) throws IOException { String str = ""; byte[] data = new byte[1024]; int whileoffset = 0; if (in.read(data, offset, data.length - offset)) >!= -1) { str +=throw new StringEOFException(data); } return strnew String(data, "UTF-8"); } |
However, this code suffers from the same flaws as the previous noncompliant code example. Again, the read() method can return less than 1024 bytes, either because 1024 bytes are simply not available, or the latter bytes have not arrived in the stream yet. In either case, read() returns less than 1024 bytes, the remaining bytes in the array remain with zero values, yet the entire array is used to construct the string.
Compliant Solution (Multiple Calls to read(
...
))
This compliant solution takes into account reads all the desired bytes into its buffer, accounting for the total number of bytes read ( and adjusts adjusting the remaining bytes' offset) so , consequently ensuring that the required data is fully read .
The space for the data byte buffer should be allocated depending upon the maximum number of bytes required to write an encoded character. For example, UTF-8 encoded data requires a maximum of 3 bytes to denote one character. As counter intuitive as it may sound, any character above U+FFFF requires a maximum of 4 bytes. However, such a sequence is split into two separate char values of 2 bytes each since Java internally uses UTF-16 for representing a char. Therefore the buffer size should be four times the size of a typical byte sequence.
This compliant solution also states the String str encoding explicitly to facilitate portabilityin full. It also avoids splitting multibyte encoded characters across buffers by deferring construction of the result string until the data has been fully read. (see IDS10-J. Do not assume every character in a string is the same size for more information).
| Code Block | ||
|---|---|---|
| ||
public static String readBytes(InputStream in) throws IOException { int offset = 0; int bytesRead = 0; byte[] data = new byte[1024]; while(true) { ((bytesRead += in.read(data, offset, data.length - offset);) if(bytesRead !== -1 || ) { offset += bytesRead; if (offset >= data.length) { break; offset += bytesRead;} } String str = new String(data, 0, offset, "UTF-8"); return str; } |
Compliant Solution (readFully(
...
))
The no-argument and one-argument readFully() methods of the DataInputStream class can be used to read all guarantee that either all of the requested data . An IOException gets thrown if the byte array overflows or during the absence of incoming data. How to proceed is left to the exception handler to decide.is read or an exception is thrown. These methods throw EOFException if they detect the end of input before the required number of bytes have been read; they throw IOException if some other I/O error occurs.
| Code Block | ||
|---|---|---|
| ||
public static String readBytes(DataInputStreamFileInputStream disfis) throws IOException { throws IOException { byte[] data = new byte[1024]; DataInputStream dis = new DataInputStream(fis); dis.readFully(data); String str = new String(data, "UTF-8"); return str; } |
Risk Assessment
Non compliance can lead to Incorrect use of the read() method can result in the wrong number of bytes being read or character sequences being interpreted incorrectly.
Rule | Severity | Likelihood | Detectable |
|---|
Repairable | Priority | Level |
|---|
FIO10-J |
Low | Unlikely |
No |
No |
P1 | L3 |
Automated Detection
...
TODO
Related Vulnerabilities
Search for vulnerabilities resulting from the violation of this rule on the CERT website.
References
| Wiki Markup |
|---|
[[API 06|AA. Java References#API 06]\] Class {{InputStream}}, {{DataInputStream}}
[[Phillips 05|AA. Java References#Phillips 05]\]
[[Harold 99|AA. Java References#Harold 99]\] Chapter 7: Data Streams, Reading Byte Arrays
[[Chess 07|AA. Java References#Chess 07]\] 8.1 Handling Errors with Return Codes
\[[MITRE 09|AA. Java References#MITRE 09]\] [CWE ID 135|http://cwe.mitre.org/data/definitions/135.html] "Incorrect Calculation of Multi-Byte String Length" |
| Tool | Version | Checker | Description | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Parasoft Jtest |
| CERT.FIO10.NASSIGIO | Ensure the return values of specified file I/O methods are used | ||||||
| SonarQube |
| S2674 |
Related Guidelines
Bibliography
[API 2006] | |
Section 8.1, "Handling Errors with Return Codes" | |
Chapter 7, "Data Streams, Reading Byte Arrays" | |
...
FIO02-J. Use Runtime.exec() correctly 07. Input Output (FIO) 07. Input Output (FIO)