I was assigned a bug where a standalone Java programme is writing some content to XML files. The content has few special characters (eg: vulgar fraction one fourth) and during write process the special characters are lost. The end XML file has question marks ('?') in the place of special characters. I suspected that this problem might be related to encoding. After reading a bit more about Java I/O and encoding I found a solution.
1. First I found out the default encoding used by JRE while writing to XML files by using the following code:
OutputStreamWriter out = new OutputStreamWriter(new ByteArrayOutputStream());
The output in my case is 'ASCII', then I established the fact that the encoding is ASCII and doesn't support unicode characters.
2. Here is my old code which uses default encoding.
BufferedWriter out = null;
out = new BufferedWriter(new FileWriter(filename, true));
3. Here is important part where we can specify the encoding we need, in my case its UTF-8. The following code specifies the required encoding when writing files.
Writer out = null;
FileOutputStream fos = new FileOutputStream(filename, true);
out = new OutputStreamWriter(fos, "UTF8");
Bingo! After the changes the special characters are properly displayed in the xml.