Interface CharSequence
A CharSequence is a readable sequence of char values. This interface provides uniform, read-only access to many different kinds of char sequences. A char value represents a character in the Basic Multilingual Plane (BMP) or a surrogate. Refer to Unicode Character Representation for details.
This interface does not refine the general contracts of the equals and hashCode methods. The result of testing two objects that implement CharSequence for equality is therefore, in general, undefined. Each object may be implemented by a different class, and there is no guarantee that each class will be capable of testing its instances for equality with those of the other. It is therefore inappropriate to use arbitrary CharSequence instances as elements in a set or as keys in a map.
Method Summary
Method Details
length
Returns the length of this character sequence. The length is the number of 16-bit char s in the sequence.
charAt
Returns the char value at the specified index. An index ranges from zero to length() — 1 . The first char value of the sequence is at index zero, the next at index one, and so on, as for array indexing. If the char value specified by the index is a surrogate, the surrogate value is returned.
isEmpty
subSequence
Returns a CharSequence that is a subsequence of this sequence. The subsequence starts with the char value at the specified index and ends with the char value at index end — 1 . The length (in char s) of the returned sequence is end — start , so if start == end then an empty sequence is returned.
toString
Returns a string containing the characters in this sequence in the same order as this sequence. The length of the string will be the length of this sequence.
chars
Returns a stream of int zero-extending the char values from this sequence. Any char which maps to a surrogate code point is passed through uninterpreted. The stream binds to this sequence when the terminal stream operation commences (specifically, for mutable sequences the spliterator for the stream is late-binding). If the sequence is modified during that operation then the result is undefined.
codePoints
Returns a stream of code point values from this sequence. Any surrogate pairs encountered in the sequence are combined as if by Character.toCodePoint and the result is passed to the stream. Any other code units, including ordinary BMP characters, unpaired surrogates, and undefined code units, are zero-extended to int values which are then passed to the stream. The stream binds to this sequence when the terminal stream operation commences (specifically, for mutable sequences the spliterator for the stream is late-binding). If the sequence is modified during that operation then the result is undefined.
compare
Compares two CharSequence instances lexicographically. Returns a negative value, zero, or a positive value if the first sequence is lexicographically less than, equal to, or greater than the second, respectively. The lexicographical ordering of CharSequence is defined as follows. Consider a CharSequence cs of length len to be a sequence of char values, cs[0] to cs[len-1]. Suppose k is the lowest index at which the corresponding char values from each sequence differ. The lexicographic ordering of the sequences is determined by a numeric comparison of the char values cs1[k] with cs2[k]. If there is no such index k, the shorter sequence is considered lexicographically less than the other. If the sequences have the same length, the sequences are considered lexicographically equal.
Report a bug or suggest an enhancement
For further API reference and developer documentation see the Java SE Documentation, which contains more detailed, developer-targeted descriptions with conceptual overviews, definitions of terms, workarounds, and working code examples. Other versions.
Java is a trademark or registered trademark of Oracle and/or its affiliates in the US and other countries.
Copyright © 1993, 2023, Oracle and/or its affiliates, 500 Oracle Parkway, Redwood Shores, CA 94065 USA.
All rights reserved. Use is subject to license terms and the documentation redistribution policy.
Class Character
The Character class wraps a value of the primitive type char in an object. An object of class Character contains a single field whose type is char .
In addition, this class provides a large number of static methods for determining a character’s category (lowercase letter, digit, etc.) and for converting characters from uppercase to lowercase and vice versa.
Unicode Conformance
The fields and methods of class Character are defined in terms of character information from the Unicode Standard, specifically the UnicodeData file that is part of the Unicode Character Database. This file specifies properties including name and category for every assigned Unicode code point or character range. The file is available from the Unicode Consortium at http://www.unicode.org.
Character information is based on the Unicode Standard, version 15.0.
The Java platform has supported different versions of the Unicode Standard over time. Upgrades to newer versions of the Unicode Standard occurred in the following Java releases, each indicating the new version:
Java release | Unicode version |
---|---|
Java SE 20 | Unicode 15.0 |
Java SE 19 | Unicode 14.0 |
Java SE 15 | Unicode 13.0 |
Java SE 13 | Unicode 12.1 |
Java SE 12 | Unicode 11.0 |
Java SE 11 | Unicode 10.0 |
Java SE 9 | Unicode 8.0 |
Java SE 8 | Unicode 6.2 |
Java SE 7 | Unicode 6.0 |
Java SE 5.0 | Unicode 4.0 |
Java SE 1.4 | Unicode 3.0 |
JDK 1.1 | Unicode 2.0 |
JDK 1.0.2 | Unicode 1.1.5 |
Variations from these base Unicode versions, such as recognized appendixes, are documented elsewhere.
Unicode Character Representations
The char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the definition of the U+n notation in the Unicode Standard.)
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).
- The methods that only accept a char value cannot support supplementary characters. They treat char values from the surrogate ranges as undefined characters. For example, Character.isLetter(‘\uD840’) returns false , even though this specific value if followed by any low-surrogate value in a string would represent a letter.
- The methods that accept an int value support all Unicode characters, including supplementary characters. For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph).
In the Java SE API documentation, Unicode code point is used for character values in the range between U+0000 and U+10FFFF, and Unicode code unit is used for 16-bit char values that are code units of the UTF-16 encoding. For more information on Unicode terminology, refer to the Unicode Glossary.
This is a value-based class; programmers should treat instances that are equal as interchangeable and should not use instances for synchronization, or unpredictable behavior may occur. For example, in a future release, synchronization may fail.
Class Character
The Character class wraps a value of the primitive type char in an object. An object of class Character contains a single field whose type is char .
In addition, this class provides a large number of static methods for determining a character’s category (lowercase letter, digit, etc.) and for converting characters from uppercase to lowercase and vice versa.
Unicode Conformance
The fields and methods of class Character are defined in terms of character information from the Unicode Standard, specifically the UnicodeData file that is part of the Unicode Character Database. This file specifies properties including name and category for every assigned Unicode code point or character range. The file is available from the Unicode Consortium at http://www.unicode.org.
Character information is based on the Unicode Standard, version 13.0.
The Java platform has supported different versions of the Unicode Standard over time. Upgrades to newer versions of the Unicode Standard occurred in the following Java releases, each indicating the new version:
Java release | Unicode version |
---|---|
Java SE 15 | Unicode 13.0 |
Java SE 13 | Unicode 12.1 |
Java SE 12 | Unicode 11.0 |
Java SE 11 | Unicode 10.0 |
Java SE 9 | Unicode 8.0 |
Java SE 8 | Unicode 6.2 |
Java SE 7 | Unicode 6.0 |
Java SE 5.0 | Unicode 4.0 |
Java SE 1.4 | Unicode 3.0 |
JDK 1.1 | Unicode 2.0 |
JDK 1.0.2 | Unicode 1.1.5 |
Variations from these base Unicode versions, such as recognized appendixes, are documented elsewhere.
Unicode Character Representations
The char data type (and therefore the value that a Character object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, known as Unicode scalar value. (Refer to the definition of the U+n notation in the Unicode Standard.)
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).
- The methods that only accept a char value cannot support supplementary characters. They treat char values from the surrogate ranges as undefined characters. For example, Character.isLetter(‘\uD840’) returns false , even though this specific value if followed by any low-surrogate value in a string would represent a letter.
- The methods that accept an int value support all Unicode characters, including supplementary characters. For example, Character.isLetter(0x2F81A) returns true because the code point value represents a letter (a CJK ideograph).
In the Java SE API documentation, Unicode code point is used for character values in the range between U+0000 and U+10FFFF, and Unicode code unit is used for 16-bit char values that are code units of the UTF-16 encoding. For more information on Unicode terminology, refer to the Unicode Glossary.
This is a value-based class; programmers should treat instances that are equal as interchangeable and should not use instances for synchronization, or unpredictable behavior may occur. For example, in a future release, synchronization may fail.