Wednesday, 22 February 2012

Preserving special characters - Java I/O

I was assigned a bug where a standalone Java programme is writing some content to XML files. The content has few special characters (eg: vulgar fraction one fourth) and during write process the special characters are lost. The end XML file has question marks ('?') in the place of special characters. I suspected that this problem might be related to encoding. After reading a bit more about Java I/O and encoding I found a solution.

1. First I found out the default encoding used by JRE while writing to XML files by using the following code:

OutputStreamWriter out = new OutputStreamWriter(new ByteArrayOutputStream());
System.out.println(out.getEncoding());

The output in my case is 'ASCII', then I established the fact that the encoding is ASCII and doesn't support unicode characters.

2. Here is my old code which uses default encoding.

BufferedWriter out = null;
out = new BufferedWriter(new FileWriter(filename, true));

3. Here is important part where we can specify the encoding we need, in my case its UTF-8. The following code specifies the required encoding when writing files.

Writer out = null;
FileOutputStream fos = new FileOutputStream(filename, true);
out = new OutputStreamWriter(fos, "UTF8");

out.write(buffer.toString());

out.close();

Bingo! After the changes the special characters are properly displayed in the xml.

No comments:

Post a Comment