...
This noncompliant code example attempts to read {1024}} bytes from a FileInputStream and to return them as a String. Unfortunately, this may not happen because of the general contract of the read() methods.
| Code Block | ||
|---|---|---|
| ||
public static String readBytes(FileInputStream in) throws IOException {
String str = "";
byte[] data = new byte[1024];
while (in.read(data) > -1) {
str += new String(data);
}
return str;
}
|
A second issue involves multibyte character encodingThis noncompliant code example can fail in several different ways. First, the programmer's misunderstanding of the general contract of the read() methods can result in failure to read the intended data in full. Second, the code fails to consider the interaction between characters represented with a multi-byte encoding and the boundaries between the loop iterations. When the last byte read from the data stream is the leading byte of a multibyte character, the trailing bytes will not be encountered until the next iteration of the while loop. However, multi-byte encoding is resolved during construction of the new String within the loop. Consequently, the multibyte encoding will be interpreted incorrectly in this case. Finally, because no specific character encoding is specified in the call to the String class constructor, the buffer str contains data represented by constructor uses the system 's default character encoding . This will be problematic when to interpret the bytes in the buffer. If the input used a character encoding that differs from the system's default character encoding differs from the intended character encoding, the resulting string can be corrupted.
Compliant Solution (
...
Multiple calls to read)
This compliant solution reads all the desired bytes into its buffer, accounting for the total number of bytes read and adjusting the remaining bytes' offset, thus ensuring that the required data is are read in full. It avoids splitting multibyte encoded characters across buffers by deferring construction of the result string until all of the desired data has have been read in full. It also specifies an explicit character encoding for the String constructor to facilitate facilitates portability across systems that use different default character encodings by specifying an explicit character encoding for the String constructor.
| Code Block | ||
|---|---|---|
| ||
public static String readBytes(FileInputStream in) throws IOException {
int offset = 0;
int bytesRead = 0;
byte[] data = new byte[1024];
while (true) {
bytesRead += in.read(data, offset, data.length - offset);
if (bytesRead == -1 || offset >= data.length)
break;
offset += bytesRead;
}
String str = new String(data, "UTF-8");
return str;
}
|
The size of the data byte buffer depends on the maximum number of bytes required to write an encoded character. For example, UTF-8 encoded data requires four bytes to represend any character above U+FFFF. Because Java uses the UTF-16 character encoding to represent char data, such sequences are split into two separate char values of two bytes each. Consequently, the buffer size should be four times the size of a typical byte sequence.
Compliant Solution (
...
readFully)
The no-argument and one-argument readFully() methods of the DataInputStream class can be used to guarantee that they either will read all of the requested data . An EOFException is thrown if or will throw an exception. These methods throw EOFException if they detect the end of input is detected before the required number of bytes have been read, and an IOException is thrown ; they throw IOException if some other input/output error occurs. This compliant solution also specifies an explicit character encoding to the String constructor.
| Code Block | ||
|---|---|---|
| ||
public static String readBytes(FileInputStream fis) throws IOException {
byte[] data = new byte[1024];
DataInputStream dis = new DataInputStream(fis);
dis.readFully(data);
String str = new String(data, "UTF-8");
return str;
}
|
...