- Python Regex Flags
- Table of contents
- IGNORECASE flag
- DOTALL flag
- VERBOSE flag
- MULTILINE flag
- ASCII flag
- Case Insensitive Regex in Python
- Case Insensitive Regex in Python
- Match a String Using the Case Insensitive re.IGNORECASE Flag in Python
- Match a String Using the Case Insensitive Marker (?i) in Python
- Conclusion
- Related Article — Python Regex
Python Regex Flags
Python regex allows optional flags to specify when using regular expression patterns with match() , search() , and split() , among others.
All RE module methods accept an optional flags argument that enables various unique features and syntax variations.
For example, you want to search a word inside a string using regex. You can enhance this regex’s capability by adding the RE.I flag as an argument to the search method to enable case-insensitive searching.
You will learn how to use all regex flags available in Python with short and clear examples.
First, refer to the below table for available regex flags.
Flag | long syntax | Meaning |
---|---|---|
re.A | re.ASCII | Perform ASCII-only matching instead of full Unicode matching |
re.I | re.IGNORECASE | Perform case-insensitive matching |
re.M | re.MULTILINE | This flag is used with metacharacter ^ (caret) and $ (dollar). When this flag is specified, the metacharacter ^ matches the pattern at beginning of the string and each newline’s beginning ( \n ). And the metacharacter $ matches pattern at the end of the string and the end of each new line ( \n ) |
re.S | re.DOTALL | Make the DOT ( . ) special character match any character at all, including a newline. Without this flag, DOT( . ) will match anything except a newline |
re.X | re.VERBOSE | Allow comment in the regex. This flag is useful to make regex more readable by allowing comments in the regex. |
re.L | re.LOCALE | Perform case-insensitive matching dependent on the current locale. Use only with bytes patterns |
Python regex flags
To specify more than one flag, use the | operator to connect them. For example, case insensitive searches in a multiline string
re.findall(pattern, string, flags=re.I|re.M|re.X)
Now let’s see how to use each option flag in Python regex.
Table of contents
IGNORECASE flag
First of all, let’s see the re.I flag’s role, which stands for ignoring a case. specified this flag in the regex method as an argument to perform case insensitive matching. You can specify this flag using two ways
import re target_str = "KELLy is a Python developer at a PYnative. kelly loves ML and AI" # Without using re.I result = re.findall(r"kelly", target_str) print(result) # Output ['kelly'] # with re.I result = re.findall(r"kelly", target_str, re.I) print(result) # Output ['KELLy', 'kelly'] # with re.IGNORECASE result = re.findall(r"kelly", target_str, re.IGNORECASE) print(result) # Output ['KELLy', 'kelly']
Notice the word “kelly” the occurs two times inside this string., First, capitalized at the beginning of the sentences and second in all lowercase.
In the first re.findall() method, we got only one occurrence because, by default, the matching is case sensitive.
And in the second re.findall() method, we got 2 occurrences because we changed the case sensitive behavior of regex using re.I so that it can find all the occurrences of a word regardless of any of its letters being uppercase or lowercase.
DOTALL flag
Now, let’s see the re.S flag’s role. You can specify this flag using two ways
As you know, By default, the dot( . ) metacharacter inside the regular expression pattern represents any character, be it a letter, digit, symbol, or a punctuation mark, except the new line character, which is \n .
The re.S flag makes this exception disappear by enabling the DOT( . ) metacharacter to match any possible character, including the new line character hence its name DOTALL.
This can prove to be pretty useful in some scenarios, especially when the target string is a multi-line.
Now let’s use the re.search() method with and without the RE.S flag.
import re # string with newline character target_str = "ML\nand AI" # Match any character result = re.search(r".+", target_str) print("Without using re.S flag:", result.group()) # Output 'ML' # With re.S flag result = re.search(r".+", target_str, re.S) print("With re.S flag:", result.group()) # Output 'ML\nand AI' # With re.DOTALL flag result = re.search(r".+", target_str, re.DOTALL) print("With re.DOTALL flag:", result.group()) # Output 'ML\nand AI'
In the first call of a re.search() method, DOT didn’t recognize the \n and stopped matching. After adding the re.S option flag in the next call, The dot character matched the entire string.
VERBOSE flag
That re.X flag stands for verbose. This flag allows more flexibility and better formatting when writing more complex regex patterns between the parentheses of the match() , search() , or other regex methods.
You can specify this flag using two ways
The verbose flag allows us to the following inside the regex pattern
- Better spacing, indentation, and a clean format for more extended and intricate patterns.
- Allows us to add comments right inside the pattern for later reference using the hash sign (#).
When to use
For some reason, you feel that the pattern looks complicated. Although it can get way more complicated than this, you can make it prettier and more readable by adding indentation and comments using re.X or re.VERBOSE .
import re target_str = "Jessa is a Python developer, and her salary is 8000" # re.X to add indentation and comment in regex result = re.search(r"""(^\w) # match 5-letter word at the start .+(\d$) # match 4-digit number at the end """, target_str, re.X) # Fiver-letter word print(result.group(1)) # Output 'Jessa' # 4-digit number print(result.group(2)) # Output 8000
MULTILINE flag
You can specify this flag using two ways
The re.M flag is used as an argument inside the regex method to perform a match inside a multiline block of text.
Note: This flag is used with metacharacter ^ and $ .
- The caret ( ^ )matches a pattern only at the beginning of the string
- The dollar ( $ ) matches the regular expression pattern at the end of the string
When this flag is specified, the pattern character ^ matches at the beginning of the string and each newline’s start ( \n ). And the metacharacter character $ match at the end of the string and the end of each newline ( \n ).
Now let’s see the examples.
import re target_str = "Joy lucky number is 75\nTom lucky number is 25" # find 3-letter word at the start of each newline # Without re.M or re.MULTILINE flag result = re.findall(r"^\w", target_str) print(result) # Output ['Joy'] # find 2-digit at the end of each newline # Without re.M or re.MULTILINE flag result = re.findall(r"\d$", target_str) print(result) # Output ['25'] # With re.M or re.MULTILINE # find 3-letter word at the start of each newline result = re.findall(r"^\w", target_str, re.MULTILINE) print(result) # Output ['Joy', 'Tom'] # With re.M # find 2-digit number at the end of each newline result = re.findall(r"\d$", target_str, re.M) print(result) # Output ['75', '25']
ASCII flag
You can specify this flag using two ways
Make regex \w , \W , \b , \B , \d , \D , \s and \S perform ASCII-only matching instead of full Unicode matching. This is only meaningful for Unicode patterns and is ignored for byte patterns.
import re # string with ASCII and Unicode characters target_str = "虎太郎 and Jessa are friends" # Without re.A or re.ASCII # To match all 3-letter word result = re.findall(r"\b\w\b", target_str) print(result) # Output ['虎太郎', 'and', 'are'] # With re.A or re.ASCII # regex to match only 3-letter ASCII word result = re.findall(r"\b\w\b", target_str, re.A) print(result) # Output ['and', 'are']
Case Insensitive Regex in Python
- Case Insensitive Regex in Python
- Match a String Using the Case Insensitive re.IGNORECASE Flag in Python
- Match a String Using the Case Insensitive Marker (?i) in Python
- Conclusion
Regular expressions match a particular string within a text in Python. They form a search pattern and check if this search pattern is present in the text or not.
In this article, we will be studying the case insensitive regex in Python. The different ways of performing the case insensitive searches in a text are explained further.
Case Insensitive Regex in Python
Search patterns are made up of a sequence of characters and can be specified using regex rules. However, to work with regular Python expressions, you first need to import the re module.
Case insensitive means that the text should be considered equal in lowercase and uppercase. We need to apply case-insensitive searches in our daily lives very often.
One such example is whenever we search for some commodity, say, a Bag . The information about the Bags will be displayed on the screen.
However, if we search bag in lower case letters or use mixed cases such as bAG , it should also display the same results. Therefore, we need to treat different case letters to be the same to search the results in specific scenarios easily.
Therefore, we use regular expressions which check the case insensitive patterns within a text.
So, let us discuss how to extract a search pattern from a text using regular expressions.
Match a String Using the Case Insensitive re.IGNORECASE Flag in Python
We can use the search() , match() , or sub() functions of Python to find whether our search pattern is present in the text or not and extract their exact positions.
- The pattern to be searched.
- The text in which the pattern is to be searched.
- A flag .
However, this flag parameter is an optional argument but is used to enable several features in Python.
The re.IGNORECASE is used as a flag to enable case insensitive searching within a text. It will consider the characters [A-Z] the same as [a-z] for a string.
Let us have an example of using the re.IGNORECASE as a flag in our code.
import re re.search('the', 'ThE', re.IGNORECASE)
Similarly, you can pass the flag as re.IGNORECASE in the match() function or the sub() function to search for a case insensitive string in the text.
However, if you want to search for all the string occurrences in a text, you should use Python’s re.findall() function. It will find all the matched strings that are present in the text.
However, you must pass the flag re.IGNORECASE in the arguments to find the case insensitive strings in a text.
Let us see how to extract all the string occurrences within a text.
import re re.findall('the', 'The sources informed the police of tHe thieves.', re.IGNORECASE)
The re.IGNORECASE flag, which is used above, can also be written as re.I . This re.I flag is also used to search a case insensitive pattern within a text.
Let us see it with an example.
import re re.findall('the', 'The sources informed the police of tHe thieves.', re.I)
All these methods are present inside the re module in Python. Therefore, the re module must be imported into the program before using them.
Match a String Using the Case Insensitive Marker (?i) in Python
When you do not want to add the flag parameter in the search() or any other function to extract the string from the text, we use a case insensitive marker denoted by (?i) .
It is applied in the regex function before the search pattern without specifying an extra flag parameter.
Below is the code to use the case insensitive marker (?i) with the search() method.
import re re.search('(?i)TABLE', table)
However, you can search the pattern within a much larger string and find all the multiple occurrences of the search pattern from the string using the findall() method in Python.
Below is the code snippet to use the case insensitive marker (?i) with the findall() method in Python.
import re text = "Let it rain, let it snow, let it do!" re.findall('(?i)LEt' , text)
Therefore, the above code snippet outputs all the occurrences of the search pattern within the text. Put the symbol (?i) before the search pattern.
Conclusion
This article has discussed regular expressions and how to use them to find the case-insensitive search patterns within a text. We have used two ways.
First is the re.IGNORECASE flag, which is passed as an argument in the searching functions such as search() , match() , findall() , etc. You can also use the re.I flag to search for the case-insensitive patterns with your string.
However, the second method uses the case insensitive marker (?i) , placed before the search pattern in the searching functions.
We can find the case-insensitive patterns in our text using these methods.