Java read utf 8 files

Read UTF-8 Encoded Data in java

In this post, we will see how to read UTF-8 Encoded Data.

Sometimes, we have to deal with UTF-8 Encoded Data in our application. It may be due localization or may be processing data from user input.

Читайте также:  What is an html error 404

There are multiple ways to read UTF-8 Encoded Data in Java.

Using Files’s newBufferedReader()

We can use java.nio.file.Files’s newBufferedReader() to read UTF8 data to String.

Please note that WriteUTF8newBufferWriter.txt was written from this example.

Using BufferedReader

We need to pass encoding as UTF8 while creating new InputStreamReader .

Please note that UTFDemo.txt was written from this example.

Using DataInputStream’s readUTF() method

We can use DataInputStream readUTF() to read UTF8 data to file.

Please note that WriteUTFDemo.txt was written from this example.

That’s all about how to write UTF-8 Encoded Data in java

Was this post helpful?

Count Files in Directory in Java

Count Files in Directory in Java

How to Remove Extension from Filename in Java

How to Remove Extension from Filename in Java

How to Get Temp Directory Path in Java

How to Get Temp Directory Path in Java

Convert OutputStream to byte array in java

Convert Outputstream to Byte Array in Java

How to get current working directory in java

How to get current working directory in java

Difference between Scanner and BufferReader in java

Difference between Scanner and BufferReader in java

Read UTF-8 Encoded Data in java

Read UTF-8 Encoded Data in java

WriteUTF-8NewBufferWriter

Write UTF-8 Encoded Data in java

Java read file line by line

Java read file line by line

Java FileWriter Example

Java FileWriter Example

Java FileReader Example

Java FileReader Example

Java - Create new file

Java – Create new file

Share this

Author

Count Files in Directory in Java

Count Files in Directory in Java

Table of ContentsUsing java.io.File ClassUse File.listFiles() MethodCount Files in the Current Directory (Excluding Sub-directories)Count Files in the Current Directory (Including Sub-directories)Count Files & Folders in Current Directory (Excluding Sub-directories)Count Files & Folders in Current Directory (Including Sub-directories)Use File.list() MethodUsing java.nio.file.DirectoryStream ClassCount Files in the Current Directory (Excluding Sub-directories)Count Files in the Current Directory (Including Sub-directories)Count […]

How to Remove Extension from Filename in Java

Table of ContentsWays to Remove extension from filename in javaUsing substring() and lastIndexOf() methodsUsing replaceAll() methodUsing Apache common library In this post, we will see how to remove extension from filename in java. Ways to Remove extension from filename in java There are multiple ways to remove extension from filename in java. Let’s go through […]

How to Get Temp Directory Path in Java

Table of ContentsGet Temp Directory Path in JavaUsing System.getProperty()By Creating Temp File and Extracting Temp PathUsing java.io.FileUsing java.nio.File.FilesOverride Default Temp Directory Path In this post, we will see how to get temp directory path in java. Get Temp Directory Path in Java Using System.getProperty() To get the temp directory path, you can simply use System.getProperty(«java.io.tmpdir»). […]

Convert OutputStream to byte array in java

Convert Outputstream to Byte Array in Java

Table of ContentsConvert OutputStream to Byte array in JavaConvert OutputStream to ByteBuffer in Java In this post, we will see how to convert OutputStream to Byte array in Java. Convert OutputStream to Byte array in Java Here are steps to convert OutputStream to Byte array in java. Create instance of ByteArrayOutputStream baos Write data to […]

How to get current working directory in java

Difference between Scanner and BufferReader in java

Table of ContentsIntroductionScannerBufferedReaderDifference between Scanner and BufferedReader In this post, we will see difference between Scanner and BufferReader in java. Java has two classes that have been used for reading files for a very long time. These two classes are Scanner and BufferedReader. In this post, we are going to find major differences and similarities […]

Источник

Reading and Writing UTF-8 Data into File

Many times we need to deal with the UTF-8 encoded file in our application. This may be due to localization needs or simply processing user input out of some requirements.

Even some data sources may provide data in UTF-8 format only. In this Java tutorial, we will learn two very simple examples of reading and writing UTF-8 content from a file.

1. Writing UTF-8 Encoded Data into a File

The given below is a Java example to demonstrate how to write “UTF-8” encoded data into a file. It uses the character encoding “UTF-8” while creating the OutputStreamWriter .

File file = new File("c:\\temp\\test.txt"); try (Writer out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(file), StandardCharsets.UTF_8))) < out.append("Howtodoinjava.com") .append("\r\n") .append("UTF-8 Demo") .append("\r\n") .append("क्षेत्रफल = लंबाई * चौड़ाई") .append("\r\n"); out.flush(); >catch (Exception e)

We need to enable the Eclipse IDE for support of the UTF-8 character set before running the example in Eclipse. By default, it is disabled. If you wish to enable the UTF-8 support in eclipse, we will get the necessary help for my previous post:

Read: How to compile and run a java program written in another language

2. Reading UTF-8 Encoded Data from a File

We need to pass StandardCharsets.UTF_8 into the InputStreamReader constructor to read data from a UTF-8 encoded file.

File file = new File("c:\\temp\\test.txt"); try (BufferedReader in = new BufferedReader( new InputStreamReader(new FileInputStream(file), "UTF8"))) < String str; while ((str = in.readLine()) != null) < System.out.println(str); >> catch (Exception e)
Howtodoinjava.com UTF-8 Demo क्षेत्रफल = लंबाई * चौड़ाई

Источник

How to Read Files in Java

Throughout the tutorial, we are using a file stored in the src directory where the path to the file is src/file.txt .

Store several lines of text in this file before proceeding.

Note: You have to properly handle the errors when using these implementations to stick to the best coding practices.

Reading Text Files in Java with BufferedReader

The BufferedReader class reads a character-input stream. It buffers characters in a buffer with a default size of 8 KB to make the reading process more efficient. If you want to read a file line by line, using BufferedReader is a good choice.

BufferedReader is efficient in reading large files.

import java.io.*; public class FileReaderWithBufferedReader < public static void main(String[] args) throws IOExceptionbufferedReader.close(); > > 

The readline() method returns null when the end of the file is reached.

Reading UTF-8 Encoded File in Java with BufferedReader

We can use the BufferedReader class to read a UTF-8 encoded file.

This time, we pass an InputStreamReader object when creating a BufferedReader instance.

import java.io.*; public class EncodedFileReaderWithBufferedReader < public static void main(String[] args) throws IOException < String file = "src/fileUtf8.txt"; BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8")); String curLine; while ((curLine = bufferedReader.readLine()) != null)< //process the line as you require System.out.println(curLine); >> > 

Using Java Files Class to Read a File

Java Files class, introduced in Java 7 in Java NIO, consists fully of static methods that operate on files.

Using Files class, you can read the full content of a file into an array. This makes it a good choice for reading smaller files.

Let’s see how we can use Files class in both these scenarios.

Reading Small Files in Java with Files Class

The readAllLines() method of the Files class allows reading the whole content of the file and stores each line in an array as strings.

You can use the Path class to get the path to the file since the Files class accepts the Path object of the file.

import java.io.IOException; import java.nio.file.*; import java.util.*; public class SmallFileReaderWithFiles < public static void main(String[] args) throws IOException < String file = "src/file.txt"; Path path = Paths.get(file); Listlines = Files.readAllLines(path); > > 

You can use readAllBytes() to retrieve the data stored in the file to a byte array instead of a string array.

byte[] bytes = Files.readAllBytes(path); 

Reading Large Files in Java with Files Class

If you want to read a large file with the Files class, you can use the newBufferedReader() method to obtain an instance of BufferedReader class and read the file line by line using a BufferedReader .

import java.io.*; import java.nio.file.*; public class LargeFileReaderWithFiles < public static void main(String[] args) throws IOException < String file = "src/file.txt"; Path path = Paths.get(file); BufferedReader bufferedReader = Files.newBufferedReader(path); String curLine; while ((curLine = bufferedReader.readLine()) != null)< System.out.println(curLine); >bufferedReader.close(); > > 

Reading Files with Files.lines()

Java 8 introduced a new method to the Files class to read the whole file into a Stream of strings.

import java.io.IOException; import java.nio.file.*; import java.util.stream.Stream; public class FileReaderWithFilesLines < public static void main(String[] args) throws IOException < String file = "src/file.txt"; Path path = Paths.get(file); Streamlines = Files.lines(path); lines.forEach(s -> System.out.println(s)); lines.close(); > > 

Reading Text Files in Java with Scanner

The Scanner class breaks the content of a file into parts using a given delimiter and reads it part by part. This approach is best suited for reading content that is separated by a delimiter.

For example, the Scanner class is ideal for reading a list of integers separated by white spaces or a list of strings separated by commas.

The default delimiter of the Scanner class is whitespace. But you can set the delimiter to another character or a regular expression. It also has various next methods, such as next() , nextInt() , nextLine() , and nextByte() , to convert content into different types.

import java.io.IOException; import java.util.Scanner; import java.io.File; public class FileReaderWithScanner < public static void main(String[] args) throws IOException< String file = "src/file.txt"; Scanner scanner = new Scanner(new File(file)); scanner.useDelimiter(" "); while(scanner.hasNext())< String next = scanner.next(); System.out.println(next); >scanner.close(); > > 

In the above example, we set the delimiter to whitespace and use the next() method to read the next part of the content separated by whitespace.

Reading an Entire File

You can use the Scanner class to read the entire file at once without running a loop. You have to pass “\\Z” as the delimiter for this.

scanner.useDelimiter("\\Z"); System.out.println(scanner.next()); scanner.close(); 

Conclusion

As you saw in this tutorial, Java offers many methods that you can choose from according to the nature of the task at your hand to read text files. You can use BufferedReader to read large files line by line.

If you want to read a file that has its content separated by a delimiter, use the Scanner class.

Also you can use Java NIO Files class to read both small and large files.

Источник

Reading UTF8 data from a file using Java

In general, data is stored in a computer in the form of bits (1 or, 0). There are various coding schemes available specifying the set of bytes represented by each character.

Unicode (UTF) − Stands for Unicode Translation Format. It is developed by The Unicode Consortium. if you want to create documents that use characters from multiple character sets, you will be able to do so using the single Unicode character encodings. It provides 3 types of encodings.

  • UTF-8 − It comes in 8-bit units (bytes), a character in UTF8 can be from 1 to 4 bytes long, making UTF8 variable width.
  • UTF-16 − It comes in 16-bit units (shorts), it can be 1 or 2 shorts long, making UTF16 variable width.
  • UTF-32 − It comes in 32-bit units (longs). It is a fixed-width format and is always 1 «long» in length.

Writing UTF data to a file

The readUTF() method of the java.io.DataOutputStream reads data that is in modified UTF-8 encoding, into a String and returns it. Therefore to read UTF-8 data to a file −

  • Instantiate the FileInputStream class by passing a String value representing the path of the required file, as a parameter.
  • Instantiate the DataInputStream class bypassing the above created FileInputStream object as a parameter.
  • read UTF data from the InputStream object using the readUTF() method.

Example

import java.io.DataInputStream; import java.io.EOFException; import java.io.FileInputStream; import java.io.IOException; public class UTF8Example < public static void main(String args[]) < StringBuffer buffer = new StringBuffer(); try < //Instantiating the FileInputStream class FileInputStream fileIn = new FileInputStream("D:\test.txt"); //Instantiating the DataInputStream class DataInputStream inputStream = new DataInputStream(fileIn); //Reading UTF data from the DataInputStream while(inputStream.available()>0) < buffer.append(inputStream.readUTF()); >> catch(EOFException ex) < System.out.println(ex.toString()); >catch(IOException ex) < System.out.println(ex.toString()); >System.out.println("Contents of the file: "+buffer.toString()); > >

Output

Contents of the file: టుటోరియల్స్ పాయింట్ కి స్వాగతిం

The new bufferedReader() method of the java.nio.file.Files class accepts an object of the class Path representing the path of the file and an object of the class Charset representing the type of the character sequences that are to be read() and, returns a BufferedReader object that could read the data which is in the specified format.

The value for the Charset could be StandardCharsets.UTF_8 or, StandardCharsets.UTF_16LE or, StandardCharsets.UTF_16BE or, StandardCharsets.UTF_16 or, StandardCharsets.US_ASCII or, StandardCharsets.ISO_8859_1

Therefore to read UTF-8 data to a file −

  • Create/get an object of the Path class representing the required path using the get() method of the java.nio.file.Paths class.
  • Create/get a BufferedReader object, that could read UtF-8 data, bypassing the above-created Path object and StandardCharsets.UTF_8 as parameters.
  • Using the readLine() method of the BufferedReader object read the contents of the file.

Example

import java.io.BufferedReader; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; public class UTF8Example < public static void main(String args[]) throws Exception< //Getting the Path object String filePath = "D:\samplefile.txt"; Path path = Paths.get(filePath); //Creating a BufferedReader object BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8); //Reading the UTF-8 data from the file StringBuffer buffer = new StringBuffer(); int ch = 0; while((ch = reader.read())!=-1) < buffer.append((char)ch+reader.readLine()); >System.out.println("Contents of the file: "+buffer.toString()); > >

Output

Contents of the file: టుటోరియల్స్ పాయింట్ కి స్వాగతిం

Источник

Оцените статью