- How to match whitespace in python using regular expressions
- What are whitespace characters?
- Algorithm
- Syntax
- Example 1: How to match whitespace in python
- Output
- Code Explanation
- Example
- Output
- Example
- Output
- Code Explanation
- Conclusion
- How To Remove Spaces from a String In Python
- Remove Leading and Trailing Spaces Using the strip() Method
- Remove All Spaces Using the replace() Method
- Remove Duplicate Spaces and Newline Characters Using the join() and split() Methods
- Remove All Spaces and Newline Characters Using the translate() Method
- Remove Whitespace Characters Using Regex
- Conclusion
How to match whitespace in python using regular expressions
Regular expressions, often known as RegEx, are a string of characters corresponding to letters, words, or character patterns in text strings. It implies that you may use regular expressions to match and retrieve any string pattern from the text. Search and replace procedures benefit from the usage of regular expressions. The most common application is searching for a substring that matches a pattern and substituting something else.
What are whitespace characters?
«Whitespace» refers to any letter or set of characters representing either horizontal or vertical space. Using regular expressions, the metacharacter “\s” matches whitespace characters in python.
Algorithm
- Import re functions
- Initialize a string.
- Use metacharacter \s for matching whitespace in python.
- Use the findall method, ‘ \s’ metacharacter, and the string as the arguments.
- Print the result and get the matching whitespaces.
Syntax
result = re.findall(r'[\s]', str) re.findall(): Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. regx = re.compile('\W') re.compile(): We can compile a regular expression into a regex object to look for occurrences of the same pattern inside various target strings without rewriting it. result = regx.findall(str) The re module provides a series of methods that let us look for matches in a string: findall: returns a list of all matches. split: Returns a list with the string split at each match. sub: substitutes a string for one or more matches.
Example 1: How to match whitespace in python
#importing re function import re #initialising a string str str= 'The Psychology of Money.' #storing the value of findall method in a variable result result = re.findall(r'[\s]', str) #printing the result print('The give string is \n',str) print('It has',len(result),'WhiteSpaces') print (result)
Output
The string in the above code has 3 whitespaces. Likewise, the following is the output of the above commands −
('The give string is \n', 'The Psychology of Money.') ('It has', 3, 'WhiteSpaces') [' ', ' ', ' ']
Code Explanation
We import the re-module to get started to match whitespace in python using regular expressions. The next step is to initialize the variable “str” with the string from which we want to match the whitespaces. The metacharacter “\s” is used for checking whitespaces using RegEx in python.
A variable defined as “result” stores the result of the python function findall(). This function searches the entire text for all instances in which the pattern is present. It takes two parameters, the metacharacter “[\s]” and the string “str.” The final step is to print the result as the output.
Example
#importing re function import re #initializing a string str str= "Honesty is the best policy." #storing the value of findall method in a variable result result = re.findall(r'[\s]', str) #printing the result print('The given string is \n',str) print('It has',len(result),'WhiteSpaces') print (result)
Output
The string in the above code has 4 whitespaces. Likewise, the following is the output of the above commands −
('The given string is \n', 'Honesty is the best policy.') ('It has', 4, 'WhiteSpaces') [' ', ' ', ' ', ' ']
Example
#importing re function import re #Taking input from the user and storing it in a string str str= 'Honesty is the best policy' #initialising regex, which will compile all matching word characters regx = re.compile('\W') #storing the value of findall method in a variable result result = regx.findall(str) #printing the result print('The given string is \n',str) print('It has',len(result),'WhiteSpaces') print (result)
Output
The output of the above commands is as follows −
('The given string is \n', 'Honesty is the best policy') ('It has', 4, 'WhiteSpaces') [' ', ' ', ' ', ' ']
Code Explanation
We load the re-module to begin utilizing regular expressions in Python to match whitespace. The next step is asking the user for a string input containing the whitespaces we wish to match. When using RegEx in Python, the metacharacter «s» is used to match whitespaces.
The Python method findall is stored in a variable named «result» (). This method looks for every instance of the pattern in the text. It requires the metacharacter «[\s]» and string «str» as two arguments. The output returns the whitespaces present in the string given by the user.
Conclusion
Regular expressions are specialized text strings that provide a search pattern. They are a series of characters representing certain letters, words, or character combinations in text strings. The re-module is used for working with regular expressions. The metacharacter ‘\s’ is used for matching whitespaces in python using regular expressions.
The most common functions used in RegEx are findall(), search(), split(), and sub(). Anchors, Character Sets, and Modifiers are the key components of the structure of a regular expression.
How To Remove Spaces from a String In Python
This tutorial provides examples of various methods you can use to remove whitespace from a string in Python.
A Python String is immutable, so you can’t change its value. Any method that manipulates a string value returns a new string.
The examples in this tutorial use the Python interactive console in the command line to demonstrate different methods that remove spaces. The examples use the following string:
s = ' Hello World From DigitalOcean \t\n\r\tHi There '
Output Hello World From DigitalOcean Hi There
This string has different types of whitespace and newline characters, such as space ( ), tab ( \t ), newline ( \n ), and carriage return ( \r ).
Remove Leading and Trailing Spaces Using the strip() Method
The Python String strip() method removes leading and trailing characters from a string. The default character to remove is space.
Declare the string variable:
Use the strip() method to remove the leading and trailing whitespace:
Output'Hello World From DigitalOcean \t\n\r\tHi There'
If you want to remove only the leading spaces or trailing spaces, then you can use the lstrip() and rstrip() methods.
Remove All Spaces Using the replace() Method
You can use the replace() method to remove all the whitespace characters from the string, including from between words.
Declare the string variable:
Use the replace() method to replace spaces with an empty string:
Output'HelloWorldFromDigitalOcean\t\n\r\tHiThere'
Remove Duplicate Spaces and Newline Characters Using the join() and split() Methods
You can remove all of the duplicate whitespace and newline characters by using the join() method with the split() method. In this example, the split() method breaks up the string into a list, using the default separator of any whitespace character. Then, the join() method joins the list back into one string with a single space ( » » ) between each word.
Declare the string variable:
Use the join() and split() methods together to remove duplicate spaces and newline characters:
Output'Hello World From DigitalOcean Hi There'
Remove All Spaces and Newline Characters Using the translate() Method
You can remove all of the whitespace and newline characters using the translate() method. The translate() method replaces specified characters with characters defined in a dictionary or mapping table. The following example uses a custom dictionary with the string.whitespace string constant, which contains all the whitespace characters. The custom dictionary replaces all the characters in string.whitespace with None .
Import the string module so that you can use string.whitespace :
Declare the string variable:
Use the translate() method to remove all whitespace characters:
Output'HelloWorldFromDigitalOceanHiThere'
Remove Whitespace Characters Using Regex
You can also use a regular expression to match whitespace characters and remove them using the re.sub() function.
This example uses the following file, regexspaces.py , to show some ways you can use regex to remove whitespace characters:
import re s = ' Hello World From DigitalOcean \t\n\r\tHi There ' print('Remove all spaces using regex:\n', re.sub(r"\s+", "", s), sep='') # \s matches all white spaces print('Remove leading spaces using regex:\n', re.sub(r"^\s+", "", s), sep='') # ^ matches start print('Remove trailing spaces using regex:\n', re.sub(r"\s+$", "", s), sep='') # $ matches end print('Remove leading and trailing spaces using regex:\n', re.sub(r"^\s+|\s+$", "", s), sep='') # | for OR condition
Run the file from the command line:
You get the following output:
Remove all spaces using regex: HelloWorldFromDigitalOceanHiThere Remove leading spaces using regex: Hello World From DigitalOcean Hi There Remove trailing spaces using regex: Hello World From DigitalOcean Hi There Remove leading and trailing spaces using regex: Hello World From DigitalOcean Hi There
Conclusion
In this tutorial, you learned some of the methods you can use to remove whitespace characters from strings in Python. Continue your learning about Python strings.
Want to deploy your application quickly? Try Cloudways, the #1 managed hosting provider for small-to-medium businesses, agencies, and developers — for free. DigitalOcean and Cloudways together will give you a reliable, scalable, and hassle-free managed hosting experience with anytime support that makes all your hosting worries a thing of the past. Start with $100 in free credits!