Character streams in Java
Character Streams − These handle data in 16 bit Unicode. Using these you can read and write text data only.
The Reader and Writer classes (abstract) are the super classes of all the character stream classes: classes that are used to read/write character streams. Following are the character array stream classes provided by Java −
Reader | Writer |
---|---|
BufferedReader | BufferedWriter |
CharacterArrayReader | CharacterArrayWriter |
StringReader | StringWriter |
FileReader | FileWriter |
InputStreamReader | InputStreamWriter |
FileReader | FileWriter |
Example
The following Java program reads data from a particular file using FileReader and writes it to another, using FileWriter.
import java.io.File; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; public class IOStreamsExample < public static void main(String args[]) throws IOException < //Creating FileReader object File file = new File("D:/myFile.txt"); FileReader reader = new FileReader(file); char chars[] = new char[(int) file.length()]; //Reading data from the file reader.read(chars); //Writing data to another file File out = new File("D:/CopyOfmyFile.txt"); FileWriter writer = new FileWriter(out); //Writing data to the file writer.write(chars); writer.flush(); System.out.println("Data successfully written in the specified file"); >>
Output
Data successfully written in the specified file
Character Streams
The Java platform stores character values using Unicode conventions. Character stream I/O automatically translates this internal format to and from the local character set. In Western locales, the local character set is usually an 8-bit superset of ASCII.
For most applications, I/O with character streams is no more complicated than I/O with byte streams. Input and output done with stream classes automatically translates to and from the local character set. A program that uses character streams in place of byte streams automatically adapts to the local character set and is ready for internationalization all without extra effort by the programmer.
If internationalization isn’t a priority, you can simply use the character stream classes without paying much attention to character set issues. Later, if internationalization becomes a priority, your program can be adapted without extensive recoding. See the Internationalization trail for more information.
Using Character Streams
All character stream classes are descended from Reader and Writer . As with byte streams, there are character stream classes that specialize in file I/O: FileReader and FileWriter . The CopyCharacters example illustrates these classes.
import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; public class CopyCharacters < public static void main(String[] args) throws IOException < FileReader inputStream = null; FileWriter outputStream = null; try < inputStream = new FileReader("xanadu.txt"); outputStream = new FileWriter("characteroutput.txt"); int c; while ((c = inputStream.read()) != -1) < outputStream.write(c); >> finally < if (inputStream != null) < inputStream.close(); >if (outputStream != null) < outputStream.close(); >> > >
CopyCharacters is very similar to CopyBytes . The most important difference is that CopyCharacters uses FileReader and FileWriter for input and output in place of FileInputStream and FileOutputStream . Notice that both CopyBytes and CopyCharacters use an int variable to read to and write from. However, in CopyCharacters , the int variable holds a character value in its last 16 bits; in CopyBytes , the int variable holds a byte value in its last 8 bits.
Character Streams that Use Byte Streams
Character streams are often «wrappers» for byte streams. The character stream uses the byte stream to perform the physical I/O, while the character stream handles translation between characters and bytes. FileReader , for example, uses FileInputStream , while FileWriter uses FileOutputStream .
There are two general-purpose byte-to-character «bridge» streams: InputStreamReader and OutputStreamWriter . Use them to create character streams when there are no prepackaged character stream classes that meet your needs. The sockets lesson in the networking trail shows how to create character streams from the byte streams provided by socket classes.
Line-Oriented I/O
Character I/O usually occurs in bigger units than single characters. One common unit is the line: a string of characters with a line terminator at the end. A line terminator can be a carriage-return/line-feed sequence ( «\r\n» ), a single carriage-return ( «\r» ), or a single line-feed ( «\n» ). Supporting all possible line terminators allows programs to read text files created on any of the widely used operating systems.
Let’s modify the CopyCharacters example to use line-oriented I/O. To do this, we have to use two classes we haven’t seen before, BufferedReader and PrintWriter . We’ll explore these classes in greater depth in Buffered I/O and Formatting. Right now, we’re just interested in their support for line-oriented I/O.
The CopyLines example invokes BufferedReader.readLine and PrintWriter.println to do input and output one line at a time.
import java.io.FileReader; import java.io.FileWriter; import java.io.BufferedReader; import java.io.PrintWriter; import java.io.IOException; public class CopyLines < public static void main(String[] args) throws IOException < BufferedReader inputStream = null; PrintWriter outputStream = null; try < inputStream = new BufferedReader(new FileReader("xanadu.txt")); outputStream = new PrintWriter(new FileWriter("characteroutput.txt")); String l; while ((l = inputStream.readLine()) != null) < outputStream.println(l); >> finally < if (inputStream != null) < inputStream.close(); >if (outputStream != null) < outputStream.close(); >> > >
Invoking readLine returns a line of text with the line. CopyLines outputs each line using println , which appends the line terminator for the current operating system. This might not be the same line terminator that was used in the input file.
There are many ways to structure text input and output beyond characters and lines. For more information, see Scanning and Formatting.
Character streams in java
Version 1.1 of the Java Development Kit introduced support for character streams to the java.io package.
Prior to JDK 1.1, the standard I/O facilities supported only byte streams, via the InputStream and OutputStream classes and their subclasses. Character streams are like byte streams, but they contain 16-bit Unicode characters rather than eight-bit bytes. They are implemented by the Reader and Writer classes and their subclasses. Readers and Writers support essentially the same operations as InputStreams and OutputStreams, except that where byte-stream methods operate on bytes or byte arrays, character-stream methods operate on characters, character arrays, or strings.
Most of the functionality available for byte streams is also provided for character streams. This is reflected in the name of each character-stream class, whose prefix is usually shared with the name of the corresponding byte-stream class. For example, there is a PushbackReader class that provides the same functionality for character streams that is provided by PushbackInputStream for byte streams.
Why use character streams?
The primary advantage of character streams is that they make it easy to write programs that are not dependent upon a specific character encoding, and are therefore easy to internationalize.
Java stores strings in Unicode, an international standard character encoding that is capable of representing most of the world’s written languages. Typical user-readable text files, however, use encodings that are not necessarily related to Unicode, or even to ASCII, and there are many such encodings. Character streams hide the complexity of dealing with these encodings by providing two classes that serve as bridges between byte streams and character streams. The InputStreamReader class implements a character-input stream that reads bytes from a byte-input stream and converts them to characters according to a specified encoding. Similarly, the OutputStreamWriter class implements a character-output stream that converts characters into bytes according a specified encoding and writes them to a byte-output stream.
A second advantage of character streams is that they are potentially much more efficient than byte streams. The implementations of many of Java’s original byte streams are oriented around byte-at-a-time read and write operations. The character-stream classes, in contrast, are oriented around buffer-at-a-time read and write operations. This difference, in combination with a more efficient locking scheme, allows the character stream classes to make up for the added overhead of encoding conversion in many cases.
API overview
The character-stream classes have been designed to parallel the existing byte-stream classes in the java.io package. As noted above, the name of each character-stream class ends in Reader or Writer, as appropriate, while its prefix is usually shared with the corresponding byte-stream class, if any. The following table summarizes the new classes; in the left column, indentation indicates subclass relationships.
Character-stream class | Description | Corresponding byte class |
---|---|---|
Reader | Abstract class for character-input streams | InputStream |
BufferedReader | Buffers input, parses lines | BufferedInputStream |
LineNumberReader | Keeps track of line numbers | LineNumberInputStream |
CharArrayReader | Reads from a character array | ByteArrayInputStream |
InputStreamReader | Translates a byte stream into a character stream | (none) |
FileReader | Translates bytes from a file into a character stream | FileInputStream |
FilterReader | Abstract class for filtered character input | FilterInputStream |
PushbackReader | Allows characters to be pushed back | PushbackInputStream |
PipedReader | Reads from a PipedWriter | PipedInputStream |
StringReader | Reads from a String | StringBufferInputStream |
Writer | Abstract class for character-output streams | OutputStream |
BufferedWriter | Buffers output, uses platform’s line separator | BufferedOutputStream |
CharArrayWriter | Writes to a character array | ByteArrayOutputStream |
FilterWriter | Abstract class for filtered character output | FilterOutputStream |
OutputStreamWriter | Translates a character stream into a byte stream | (none) |
FileWriter | Translates a character stream into a byte file | FileOutputStream |
PrintWriter | Prints values and objects to a Writer | PrintStream |
PipedWriter | Writes to a PipedReader | PipedOutputStream |
StringWriter | Writes to a String | (none) |
Related changes
PrintStream
The PrintStream class has been modified to use the platform’s default character encoding and the platform’s default line terminator. Thus each PrintStream incorporates an OutputStreamWriter, and it passes all characters through this writer to produce bytes for output. The println methods use the platform’s default line terminator, which is defined by the system property line.separator and is not necessarily a single newline character ('\n'). Bytes and byte arrays written via the existing write methods are not passed through the writer.
The primary motivation for changing the PrintStream class is that it will make System.out and System.err more useful to people writing Java programs on platforms where the local encoding is something other than ASCII. PrintStream is, in other words, provided primarily for use in debugging and for compatibility with existing code. Code that produces textual output should use the new PrintWriter class, which allows the character encoding to be specified or the default encoding to be accepted. For convenience, the PrintWriter class provides constructors that take an OutputStream object and create an intermediate OutputStreamWriter object that uses the default encoding.
Other classes
The following constructors and methods have been deprecated because they do not properly convert between bytes and characters:
String | DataInputStream.readLine() |
InputStream | Runtime.getLocalizedInputStream(InputStream) |
OutputStream | Runtime.getLocalizedOutputStream(OutputStream) |
StreamTokenizer(InputStream) | |
String(byte ascii[], int hibyte, int offset, int count) | |
String(byte ascii[], int hibyte) | |
void | String.getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) |
Finally, the following constructor and methods have been added: