Java regexp space symbol

Whitespace Matching Regex — Java

The Java API for regular expressions states that \s will match whitespace. So the regex \\s\\s should match two spaces.

Pattern whitespace = Pattern.compile("\\s\\s"); matcher = whitespace.matcher(modLine); while (matcher.find()) matcher.replaceAll(" "); 

The aim of this is to replace all instances of two consecutive whitespace with a single space. However this does not actually work. Am I having a grave misunderstanding of regexes or the term «whitespace»?

String has a replaceAll function that will save you a few lines of code. download.oracle.com/javase/1.5.0/docs/api/java/lang/String.html

It isn’t your misunderstanding, but Java’s. Try splitting a string like «abc \xA0 def \x85 xyz» to see what I mean: there are only three fields there.

I’ve been wondering for over an hour why my \\s split is not splitting over whitespace. Thanks a million!

11 Answers 11

You can’t use \s in Java to match white space on its own native character set, because Java doesn’t support the Unicode white space property — even though doing so is strictly required to meet UTS#18’s RL1.2! What it does have is not standards-conforming, alas.

Unicode defines 26 code points as \p : 20 of them are various sorts of \pZ GeneralCategory=Separator, and the remaining 6 are \p GeneralCategory=Control.

White space is a pretty stable property, and those same ones have been around virtually forever. Even so, Java has no property that conforms to The Unicode Standard for these, so you instead have to use code like this:

String whitespace_chars = "" /* dummy empty string for homogeneity */ + "\\u0009" // CHARACTER TABULATION + "\\u000A" // LINE FEED (LF) + "\\u000B" // LINE TABULATION + "\\u000C" // FORM FEED (FF) + "\\u000D" // CARRIAGE RETURN (CR) + "\\u0020" // SPACE + "\\u0085" // NEXT LINE (NEL) + "\\u00A0" // NO-BREAK SPACE + "\\u1680" // OGHAM SPACE MARK + "\\u180E" // MONGOLIAN VOWEL SEPARATOR + "\\u2000" // EN QUAD + "\\u2001" // EM QUAD + "\\u2002" // EN SPACE + "\\u2003" // EM SPACE + "\\u2004" // THREE-PER-EM SPACE + "\\u2005" // FOUR-PER-EM SPACE + "\\u2006" // SIX-PER-EM SPACE + "\\u2007" // FIGURE SPACE + "\\u2008" // PUNCTUATION SPACE + "\\u2009" // THIN SPACE + "\\u200A" // HAIR SPACE + "\\u2028" // LINE SEPARATOR + "\\u2029" // PARAGRAPH SEPARATOR + "\\u202F" // NARROW NO-BREAK SPACE + "\\u205F" // MEDIUM MATHEMATICAL SPACE + "\\u3000" // IDEOGRAPHIC SPACE ; /* A \s that actually works for Java’s native character set: Unicode */ String whitespace_charclass = "[" + whitespace_chars + "]"; /* A \S that actually works for Java’s native character set: Unicode */ String not_whitespace_charclass = "[^" + whitespace_chars + "]"; 

Now you can use whitespace_charclass + «+» as the pattern in your replaceAll .

Sorry ’bout all that. Java’s regexes just don’t work very well on its own native character set, and so you really have to jump through exotic hoops to make them work.

And if you think white space is bad, you should see what you have to do to get \w and \b to finally behave properly!

Yes, it’s possible, and yes, it’s a mindnumbing mess. That’s being charitable, even. The easiest way to get a standards-comforming regex library for Java is to JNI over to ICU’s stuff. That’s what Google does for Android, because OraSun’s doesn’t measure up.

If you don’t want to do that but still want to stick with Java, I have a front-end regex rewriting library I wrote that “fixes” Java’s patterns, at least to get them conform to the requirements of RL1.2a in UTS#18, Unicode Regular Expressions.

Источник

Regex Whitespace in Java

Regex Whitespace in Java

A Regular Expression or regex is a combination of special characters that creates a search pattern that can be used to search for certain characters in Strings. In the following example, we will see how we can use various regex characters to find whitespaces in a string.

Find Whitespace Using Regular Expressions in Java

To use the regex search pattern and see if the given string matches the regex, we use the static method matches() of the class Pattern . The method matches() takes two arguments: the first is the regular expression, and the second is the string we want to match.

The most common regex character to find whitespaces are \s and \s+ . The difference between these regex characters is that \s represents a single whitespace character while \s+ represents multiple whitespaces in a string.

In the below program, we use Pattern.matches() to check for the whitespaces using the regex \s+ and then the string with three whitespaces. Then, we print whitespaceMatcher1 that outputs true , meaning that the pattern matches and finds whitespaces.

In whitespaceMatcher2 , we use the character \s to identify single whitespace which returns true for the string » » . Note that regular expressions are case-sensitive and that \S is different from \s .

Next, we use the regex [\\t\\p] which is equivalent to \s and returns true for a single whitespace.

\u0020 is a Unicode character representing space and returns true when a string with single whitespace is passed.

And the last regex \p is also a whitespace separator that identifies whitespace.

import java.util.regex.Pattern;  public class RegWhiteSpace   public static void main(String[] args)   boolean whitespaceMatcher1 = Pattern.matches("\\s+", " ");  boolean whitespaceMatcher2 = Pattern.matches("\\s", " ");  boolean whitespaceMatcher3 = Pattern.matches("[\\t\\p]", " ");  boolean whitespaceMatcher4 = Pattern.matches("\\u0020", " ");  boolean whitespaceMatcher5 = Pattern.matches("\\p", " ");   System.out.println("\\s+ ----------> " + whitespaceMatcher1);  System.out.println("\\s -----------> " + whitespaceMatcher2);  System.out.println("[\\t\\p] --> " + whitespaceMatcher3);  System.out.println("\\u0020 ------->" + whitespaceMatcher4);  System.out.println("\\p ------->" + whitespaceMatcher5);  > > 

Источник

How to Use Regex Whitespace in Java

Regex or Regular Expression is a set of special characters that combine to form a pattern to search characters in strings. In computer programming and software engineering, learning regex will be very helpful in finding information in any text. All kinds of text search, formatting, and text replacement operations can be carried out using regular expressions.

This tutorial will guide you about using the regex whitespace in Java.

What is Regex in Java?

A Regular Expression or Regex might be as simple as a single character or a complex pattern. It can be created with a string of text and symbols in a specific order. Most of the characters in a regex are letters and typographic symbols. Regex is case-sensitive, so keep that in mind while creating and using it.

How to Use Regex Whitespace in Java?

Although Java does not have any predefined Regular Expression class. However, we can use regular expressions by importing the “java.util.regex” library. It includes some classes such as “Pattern”, which is used for defining a regex pattern, and “Matcher” class which is used to search with the pattern.

There are two methods to use regex whitespace in Java as follows:

    • Using Pattern.matches() method (use predefined regex)
    • Using Pattern and Matcher class (create user-defined regex to match)

    Let’s see how these methods will work with regex for whitespace in Java.

    Method 1: Use Predefined Regex Whitespace with Pattern.matches() Method in Java

    To find whitespaces in a string, there are three common regexes in Java:

      • \s: It represents a single white space.
      • \s+: It indicates multiple white spaces.
      • \u0020: It is the Unicode of the white space used as a regex to find whitespace in a text.

      We can use these regexes in the static method “matches()” of the “Pattern” class. Pattern class belongs to the “java.util.regex” package. Below is the syntax of Pattern.matches() method is given:

      The specified method takes two arguments: the regular expression and the string to match. The first argument “\s” is the regular expression or regex of the white space, and the second argument ” “ is the space in string. It returns either true or false as a boolean value.

      Example 1: Use “\s” WhiteSpace Regex

      Here, we will use the “\s” regex in the Pattern.matches() method. We will pass a string with no space in the method as a second argument. The method will check the regex and the string and then return a boolean value that will be stored in the “match” variable:

      Print the value of the match variable using the “System.out.println()” method:

      The value returned by the “Pattern.matches()” method is “false” because the passed string has no space:

      Now we will see some other examples to match whitespace with other regexes.

      Example 2: Use “\s+” WhiteSpace Regex

      In this example, we will pass the “\s+” regex in the “matches()” method to find multiple spaces:

      Print the value of the match variable that stores the returned result from the method:

      As the second argument contains spaces, the resultant value is displayed as “true”:

      Example 3: Use “\u0020” WhiteSpace Regex

      Here, we will show you how Unicode is used as a regex in Java. For the specified purpose, we will use the “\u0020” regex as Unicode of the white space:

      The Pattern.matches() method will print “true” as a passed string containing white spaces:

      Let’s move to the other method to use regex in Java.

      Method 2: Use User-defined Regex Whitespace With Pattern and Matcher class

      The “Pattern” class is used to define or create a pattern, while the “Matcher” class is utilized to search according to the given pattern. The pattern for a regex can be created with the help of the “compile()” method of the Pattern class. It takes only one parameter, the pattern you want to compile for any purpose.

      The Matcher class matches the pattern by using the “matcher()” method. It takes a “string” as the pattern.

      There are some predefined regex for whitespaces that we have discussed above, the remaining are listed below:

      Now, let’s check out some examples.

      Example 1: Use “\\t\\p” WhiteSpace Regex

      In this example, we will find out the number of whitespaces by counting them. First, we will create a String “s” and print it out on console:

      Next, we will define a pattern “\\t\\p ” that acts as a whitespace regex in Java and is equal to “\s”. After compiling the given pattern, variable “regexPattern” will contain resultant value:

      Call the “matcher()” method and pass “s” String:

      Create an integer type variable “count” and initialize it with the value “0”:

      Count the number of whitespaces that exist in the string by using a “while” loop. The loop will traverse the String and increment the count variable value if it encounters any space:

      Lastly, print the value of count to show how many spaces are found in a string:

      Example 2: Use “\p” WhiteSpace Regex

      Now, we will find the whitespaces in the string by using another pattern “\p ”. This pattern works similar to the “\s” and “\s+” regex:

      Now, we call the “matcher()” method and pass “s” String as argument:

      As in the above example, we also use a “while” loop to count the spaces in a string and print them:

      The given output indicates that our String “Welcome to Linux Hint” contains three whitespaces:

      We compile all the easiest methods that can help you to use regex whitespace in Java.

      Conclusion

      There are many regular expressions for whitespace such as “\s”, “\s+”, “\u0020”, “\\t\\p ”, and “\\p ”. These regexes are used in the matches() method of the Pattern class or by defining a pattern with the Pattern Class and matching it using the Matcher class. The most commonly used regex whitespace is \s and \s+. In this tutorial, we covered all the methods to use regex whitespace in Java.

      About the author

      Farah Batool

      I completed my master’s degree in computer science. I am an academic researcher and love to learn and write about new technologies. I am passionate about writing and sharing my experience with the world.

      Источник

      Читайте также:  Set java home windows command
Оцените статью