Writing UTF8 data to a file using Java
In general, data is stored in a computer in the form of bits (1 or, 0). There are various coding schemes available specifying the set of bytes represented by each character.
Unicode (UTF) − Stands for Unicode Translation Format. It is developed by The Unicode Consortium. if you want to create documents that use characters from multiple character sets, you will be able to do so using the single Unicode character encodings. It provides 3 types of encodings.
- UTF-8 − It comes in 8-bit units (bytes), a character in UTF8 can be from 1 to 4 bytes long, making UTF8 variable width.
- UTF-16 − It comes in 16-bit units (shorts), it can be 1 or 2 shorts long, making UTF16 variable width.
- UTF-32 − It comes in 32-bit units (longs). It is a fixed-width format and is always 1 «long» in length.
Writing UTF data to a file
The write UTF() method of the java.io.DataOutputStream class accepts a String value as a parameter and writes it in using modified UTF-8 encoding, to the current output stream. Therefore to write UTF-8 data to a file −
- Instantiate the FileOutputStream class by passing a String value representing the path of the required file, as a parameter.
- Instantiate the DataOutputStream class bypassing the above created FileOutputStream object as a parameter.
- Write UTF data to the above created OutputStream object using the write UTF() method.
- Flush the contents of the OutputStream object to the file (destination) using the flush() method
Example
import java.io.DataOutputStream; import java.io.FileOutputStream; public class UTF8Example < public static void main(String args[]) throws Exception< //Instantiating the FileOutputStream class FileOutputStream fileOut = new FileOutputStream("D:\samplefile.txt"); //Instantiating the DataOutputStream class DataOutputStream outputStream = new DataOutputStream(fileOut); //Writing UTF data to the output stream outputStream.writeUTF("టుటోరియల్స్ పాయింట్ కి స్వాగతిం"); outputStream.flush(); System.out.println("Data entered into the file"); >>
Output
Data entered into the file
The newBufferedWriter() method of the java.nio.file.Files class accepts an object of the class Path representing the path of the file and an object of the class Charset representing the type of the character sequences that are to be read() and, returns a BufferedWriter object that could write the data in the specified format
The value for the Charset could be StandardCharsets.UTF_8 or, StandardCharsets.UTF_16LE or, StandardCharsets.UTF_16BE or, StandardCharsets.UTF_16 or, StandardCharsets.US_ASCII or, StandardCharsets.ISO_8859_1
Therefore to write UTF-8 data to a file −
- Create/get an object of the Path class representing the required path using the get() method of the java.nio.file.Paths class.
- Create/get a BufferedWriter object, that could write UtF-8 data, bypassing the above-created Path object and StandardCharsets.UTF_8 as parameters.
- Append the UTF-8 data to the above created BufferedWriter object using the append().
- Flush the contents of the BufferedWriter to the (destination) file using the flush() method.
Example
import java.io.BufferedWriter; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; public class UTF8Example < public static void main(String args[]) throws Exception< //Getting the Path object Path path = Paths.get("D:\samplefile.txt"); //Creating a BufferedWriter object BufferedWriter writer = Files.newBufferedWriter(path, StandardCharsets.UTF_8); //Appending the UTF-8 String to the file writer.append("టుటోరియల్స్ పాయింట్ కి స్వాగతిం"); //Flushing data to the file writer.flush(); System.out.println("Data entered into the file"); >>
Output
Data entered into the file
How to write a UTF-8 file in Java
In Java, the OutputStreamWriter accepts a charset to encode the character streams into byte streams. We can pass a StandardCharsets.UTF_8 into the OutputStreamWriter constructor to write data to a UTF-8 file.
try (FileOutputStream fos = new FileOutputStream(file); OutputStreamWriter osw = new OutputStreamWriter(fos, StandardCharsets.UTF_8); BufferedWriter writer = new BufferedWriter(osw))
In Java 7+, many File I/O and NIO writers start to accept charset as an argument, making write data to a UTF-8 file very easy, for examples:
// Java 7 Files.write(path, lines, StandardCharsets.UTF_8); // Java 8 Files.newBufferedWriter(path) // default UTF-8 // Java 11 new FileWriter(new File(fileName), StandardCharsets.UTF_8);
1. Write to UTF-8 file
This example shows a few ways to write some Chinese characters to a UTF-8 file.
package com.mkyong.io.howto; import java.io.*; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.Arrays; import java.util.List; public class UnicodeWrite < public static void main(String[] args) < String fileName = "c:\\temp\\test.txt"; Listlines = Arrays.asList("line 1", "line 2", "line 3", "你好,世界"); writeUnicodeJava7(fileName, lines); //writeUnicodeJava8(fileName, lines); //writeUnicodeJava11(fileName, lines); //writeUnicodeClassic(fileName, lines); System.out.println("Done"); > // in the old days public static void writeUnicodeClassic(String fileName, List lines) < File file = new File(fileName); try (FileOutputStream fos = new FileOutputStream(file); OutputStreamWriter osw = new OutputStreamWriter(fos, StandardCharsets.UTF_8); BufferedWriter writer = new BufferedWriter(osw) ) < for (String line : lines) < writer.append(line); writer.newLine(); >> catch (IOException e) < e.printStackTrace(); >> public static void writeUnicodeJava7(String fileName, List lines) < Path path = Paths.get(fileName); try < Files.write(path, lines, StandardCharsets.UTF_8); >catch (IOException e) < e.printStackTrace(); >> // Java 8 - Files.newBufferedWriter(path) - default UTF-8 public static void writeUnicodeJava8(String fileName, List lines) < Path path = Paths.get(fileName); try (BufferedWriter writer = Files.newBufferedWriter(path, StandardCharsets.UTF_8)) < for (String line : lines) < writer.append(line); writer.newLine(); >> catch (IOException e) < e.printStackTrace(); >> // Java 11 adds Charset to FileWriter public static void writeUnicodeJava11(String fileName, List lines) < try (FileWriter fw = new FileWriter(new File(fileName), StandardCharsets.UTF_8); BufferedWriter writer = new BufferedWriter(fw)) < for (String line : lines) < writer.append(line); writer.newLine(); >> catch (IOException e) < e.printStackTrace(); >> >
Reading and Writing UTF-8 Data into File
Many times we need to deal with the UTF-8 encoded file in our application. This may be due to localization needs or simply processing user input out of some requirements.
Even some data sources may provide data in UTF-8 format only. In this Java tutorial, we will learn two very simple examples of reading and writing UTF-8 content from a file.
1. Writing UTF-8 Encoded Data into a File
The given below is a Java example to demonstrate how to write “UTF-8” encoded data into a file. It uses the character encoding “UTF-8” while creating the OutputStreamWriter .
File file = new File("c:\\temp\\test.txt"); try (Writer out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(file), StandardCharsets.UTF_8))) < out.append("Howtodoinjava.com") .append("\r\n") .append("UTF-8 Demo") .append("\r\n") .append("क्षेत्रफल = लंबाई * चौड़ाई") .append("\r\n"); out.flush(); >catch (Exception e)
We need to enable the Eclipse IDE for support of the UTF-8 character set before running the example in Eclipse. By default, it is disabled. If you wish to enable the UTF-8 support in eclipse, we will get the necessary help for my previous post:
Read: How to compile and run a java program written in another language
2. Reading UTF-8 Encoded Data from a File
We need to pass StandardCharsets.UTF_8 into the InputStreamReader constructor to read data from a UTF-8 encoded file.
File file = new File("c:\\temp\\test.txt"); try (BufferedReader in = new BufferedReader( new InputStreamReader(new FileInputStream(file), "UTF8"))) < String str; while ((str = in.readLine()) != null) < System.out.println(str); >> catch (Exception e)
Howtodoinjava.com UTF-8 Demo क्षेत्रफल = लंबाई * चौड़ाई