- How to Replace String in pandas DataFrame
- 1. Quick Examples to Replace String in DataFrame
- 2. pandas Replace String Example
- 3. Replace Multiple Strings
- 4. Replace Pattern of String Using Regular Expression
- 5. Using str.replace() on DataFrame
- 6. Replace String Using apply() function with lambda
- 7. Complete Example of Replace String in DataFrame
- 8. Conclusion
- Related Articles
- References
- You may also like reading:
- pandas.Series.str.replace#
How to Replace String in pandas DataFrame
You can replace a string in the pandas DataFrame column by using replace(), str.replace() with lambda functions. In this article, I will explain how to replace the string of the DataFrame column with multiple examples.
- Replace a string with another string in pandas.
- Replace a pattern of string with another string using regular expression.
1. Quick Examples to Replace String in DataFrame
If you are in hurry below are some examples of how to replace a string in pandas DataFrame.
# Below are some quick examples. # Replace string using DataFrame.replace() method. df2 = df.replace('Py','Python with ', regex=True) # Replace pattern of string using regular expression. df2 = df.replace(, , regex=True) # Replace pattern of string using regular expression. df2=df.replace(regex=['Language'],value='Lang') # By using str.replace() df['Courses'] = df['Courses'].str.replace('Language','Lang') # Replace String using apply() function with lambda. df2 = df.apply(lambda x: x.replace(, regex=True))
Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses , Fee and Duration .
# Create a pandas DataFrame. import pandas as pd import numpy as np technologies= < 'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"], 'Fee' :[22000,25000,23000,24000,26000,27000], 'Duration':['30days','50days','30days','60days','35days','30days'] >df = pd.DataFrame(technologies) print(df)
# Output: Courses Fee Duration 0 Spark 22000 30days 1 PySpark 25000 50days 2 Spark 23000 30days 3 Java Language 24000 60days 4 PySpark 26000 35days 5 PHP Language 27000 30days
2. pandas Replace String Example
You can replace the string of pandas DataFrame column with another string by using DataFrame.replace() method. This method updates the specified value with another specified value and returns a new DataFrame. In order to update on existing DataFrame use inplace=True
# Replace string using DataFrame.replace() method. df2 = df.replace('PySpark','Python with Spark') print(df2)
Yields below output. This example replaces the string PySpark with Python with Spark .
# Output: Courses Fee Duration 0 Spark 22000 30days 1 Python with Spark 25000 50days 2 Spark 23000 30days 3 Java Language 24000 60days 4 Python with Spark 26000 35days 5 PHP Language 27000 30days
3. Replace Multiple Strings
Now let’s see how to replace multiple strings column(s), In this example, I will also show how to replace part of the string by using regex=True param. To update multiple string columns, use the dict with key-value pair. The below example updates Py with Python with on Courses column and days with Days on Duration column.
# Replace pattern of string using regular expression. df2 = df.replace(, , regex=True) print(df2)
# Output: Courses Fee Duration 0 Spark 22000 30 Days 1 Python with Spark 25000 50 Days 2 Spark 23000 30 Days 3 Java Language 24000 60 Days 4 Python with Spark 26000 35 Days 5 PHP Language 27000 30 Days
4. Replace Pattern of String Using Regular Expression
Using regular expression you can replace the matching string with another string in pandas DataFrame. The below example find string Language and replace it with Lan .
# Replace pattern of string using regular expression. df2=df.replace(regex=['Language'],value='Lang') print(df2)
# Output: Courses Fee Duration 0 Spark 22000 30days 1 PySpark 25000 50days 2 Spark 23000 30days 3 Java Lang 24000 60days 4 PySpark 26000 35days 5 PHP Lang 27000 30days
5. Using str.replace() on DataFrame
Alternatively, use str.replace() to replace a string, repalce() looks for exact matches unless you pass a regex pattern and param regex=True .
# By using str.replace() df['Courses'] = df['Courses'].str.replace('Language','Lang') print(df)
Yields same output as above. Note that this replaces the value on the Courses column on the existing DataFrame object.
6. Replace String Using apply() function with lambda
In this section, you can find out how to replace string using DataFrame.apply() with lambda expression. The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.
# Replace String using apply() function with lambda. df2 = df.apply(lambda x: x.replace(, regex=True)) print(df2)
# Output: Courses Fee Duration 0 Spark 22000 30days 1 Python withSpark 25000 50days 2 Spark 23000 30days 3 Java Lang 24000 60days 4 Python withSpark 26000 35days 5 PHP Lang 27000 30days
7. Complete Example of Replace String in DataFrame
# Create a pandas DataFrame. import pandas as pd import numpy as np technologies= < 'Courses':["Spark","PySpark","Spark","P","PySpark","P"], 'Fee' :[22000,25000,23000,24000,26000,27000], 'Duration':['30days','50days','30days','60days','35days','30days'] >df = pd.DataFrame(technologies) print(df) # Replace string using DataFrame.replace() method. df2 = df.replace('Py','Python with ', regex=True) print(df2) # Replace pattern of string using regular expression. df2 = df.replace(, , regex=True) print(df2) # Replace pattern of string using regular expression. df2=df.replace(regex=['Language'],value='Lang') print(df2) # By using str.replace() df['Courses'] = df['Courses'].str.replace('Language','Lang') print(df) # Replace String using apply() function with lambda. df2 = df.apply(lambda x: x.replace(, regex=True)) print(df2)
8. Conclusion
In this article, You have learned how to replace the string in pandas column by using DataFrame.replace() and str.replace() with lambda function with some examples.
Related Articles
References
You may also like reading:
pandas.Series.str.replace#
Replace each occurrence of pattern/regex in the Series/Index.
Equivalent to str.replace() or re.sub() , depending on the regex value.
Parameters pat str or compiled regex
String can be a character sequence or regular expression.
repl str or callable
Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub() .
n int, default -1 (all)
Number of replacements to make from start.
case bool, default None
Determines if replace is case sensitive:
- If True, case sensitive (the default if pat is a string)
- Set to False for case insensitive
- Cannot be set if pat is a compiled regex.
Regex module flags, e.g. re.IGNORECASE. Cannot be set if pat is a compiled regex.
regex bool, default False
Determines if the passed-in pattern is a regular expression:
- If True, assumes the passed-in pattern is a regular expression.
- If False, treats the pattern as a literal string
- Cannot be set to False if pat is a compiled regex or repl is a callable.
A copy of the object with all matching occurrences of pat replaced by repl .
- if regex is False and repl is a callable or pat is a compiled regex
- if pat is a compiled regex and case or flags is set
When pat is a compiled regex, all flags should be included in the compiled regex. Use of case , flags , or regex=False with a compiled regex will raise an error.
When pat is a string and regex is True (the default), the given pat is compiled as a regex. When repl is a string, it replaces matching regex patterns as with re.sub() . NaN value(s) in the Series are left as is:
>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f.', 'ba', regex=True) 0 bao 1 baz 2 NaN dtype: object
When pat is a string and regex is False, every pat is replaced with repl as with str.replace() :
>>> pd.Series(['f.o', 'fuz', np.nan]).str.replace('f.', 'ba', regex=False) 0 bao 1 fuz 2 NaN dtype: object
When repl is a callable, it is called on every pat using re.sub() . The callable should expect one positional argument (a regex object) and return a string.
>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f', repr, regex=True) 0 oo 1 uz 2 NaN dtype: object
Reverse every lowercase alphabetic word:
>>> repl = lambda m: m.group(0)[::-1] >>> ser = pd.Series(['foo 123', 'bar baz', np.nan]) >>> ser.str.replace(r'[a-z]+', repl, regex=True) 0 oof 123 1 rab zab 2 NaN dtype: object
Using regex groups (extract second group and swap case):
>>> pat = r"(?P\w+) (?P\w+) (?P\w+)" >>> repl = lambda m: m.group('two').swapcase() >>> ser = pd.Series(['One Two Three', 'Foo Bar Baz']) >>> ser.str.replace(pat, repl, regex=True) 0 tWO 1 bAR dtype: object
Using a compiled regex with flags
>>> import re >>> regex_pat = re.compile(r'FUZ', flags=re.IGNORECASE) >>> pd.Series(['foo', 'fuz', np.nan]).str.replace(regex_pat, 'bar', regex=True) 0 foo 1 bar 2 NaN dtype: object