...
The trailing byte ranges overlap the range of both the single byte and lead byte characters. This can cause issues because if a multibyte character is separated between buffer boundaries, it will be interpreted differently, as defined by its composing bytes . [Phillips 05].
Also, see FIO03-J. Specify the character encoding while performing file or network IO.
A third issue is caused because of the behavior of the String class constructor. According to the Java API [API 06] for the String class:
...
| Code Block | ||
|---|---|---|
| ||
public static String readBytes(DataInputStreamFileInputStream disfis) throws IOException { byte[] data = new byte[1024]; DataInputStream dis = new DataInputStream(fis); dis.readFully(data); String str = new String(data,"UTF-8"); return str; } |
...