Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

According to the Java API [API 06] for the class InputStream, with an array of bytes b as the parameter, the read(b) method:

Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. The number of bytes read is, at most, equal to the length of b.

Note that the read() methods will return as soon as they find that some input data is available. By default, none of them guarantee that all the requested bytes will be read. It is left to the programmer to check the number of bytes read and call the read() method again as required. Ignoring the result returned by the read() methods is a direct violation of EXP02-J. Do not ignore values returned by methods.

...

The trailing byte ranges overlap the range of both the single byte and lead byte characters. This can cause issues because if a multibyte character is separated between buffer boundaries, it will be interpreted differently, as defined by its composing bytes [Phillips 05].Also, see FIO03-J. Specify the character encoding while performing file or network IO.

A third issue is caused because of the behavior of the String class constructor . According to the Java API [API 06] for the String class:

...

with respect to the default encoding. See FIO03-J. Specify the character encoding while performing file or network IO for more details on this issue.

Noncompliant Code Example

...

A second issue involves multibyte character encoding. It is possible for the read() method to read data from the stream terminating the String buffer str with the leading byte of a multibyte character and in the next iteration reading the trailing bytes. Since This is because when the bytes are concatenated to str, the multibyte encoding information is lost as it does not extend across buffer boundaries.

Finally, the buffer str will contain contains data represented by the default encoding of the system as no specific encoding has been is specified in the call to the String class constructor.

...

The space for the data byte buffer should be allocated depending upon the maximum number of bytes required to write an encoded character. For example, UTF-8 encoded data requires a maximum of three bytes to denote one character. Although it seems counter intuitive, any character above U+FFFF requires a maximum of four bytes. However, such a sequence is split into two separate char values of two bytes each since as Java internally uses UTF-16 for representing a char. Consequently the buffer size should be four times the size of a typical byte sequence.

This compliant solution also states specifies the String encoding explicitly to facilitate portability.

...

The no argument and one argument readFully() methods of the DataInputStream class can be used to read all the requested data. An EOFException is thrown if the end of input is detected before the required number of bytes have been read, and an IOException is thrown if some other input/output error occurs. How to proceed is left to the The exception handler to decidedecides the way forward.

Code Block
bgColor#ccccff
public static String readBytes(FileInputStream fis) throws IOException
{
  byte[] data = new byte[1024];
  DataInputStream dis = new DataInputStream(fis);
  dis.readFully(data);
  String str = new String(data,"UTF-8");
  return str;
}

...