Python replace string dataframe

How to Replace String in pandas DataFrame

You can replace a string in the pandas DataFrame column by using replace(), str.replace() with lambda functions. In this article, I will explain how to replace the string of the DataFrame column with multiple examples.

  • Replace a string with another string in pandas.
  • Replace a pattern of string with another string using regular expression.

1. Quick Examples to Replace String in DataFrame

If you are in hurry below are some examples of how to replace a string in pandas DataFrame.

 # Below are some quick examples. # Replace string using DataFrame.replace() method. df2 = df.replace('Py','Python with ', regex=True) # Replace pattern of string using regular expression. df2 = df.replace(, , regex=True) # Replace pattern of string using regular expression. df2=df.replace(regex=['Language'],value='Lang') # By using str.replace() df['Courses'] = df['Courses'].str.replace('Language','Lang') # Replace String using apply() function with lambda. df2 = df.apply(lambda x: x.replace(, regex=True)) 

Now, let’s create a pandas DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses , Fee and Duration .

 # Create a pandas DataFrame. import pandas as pd import numpy as np technologies= < 'Courses':["Spark","PySpark","Spark","Java Language","PySpark","PHP Language"], 'Fee' :[22000,25000,23000,24000,26000,27000], 'Duration':['30days','50days','30days','60days','35days','30days'] >df = pd.DataFrame(technologies) print(df) 
 # Output: Courses Fee Duration 0 Spark 22000 30days 1 PySpark 25000 50days 2 Spark 23000 30days 3 Java Language 24000 60days 4 PySpark 26000 35days 5 PHP Language 27000 30days 

2. pandas Replace String Example

You can replace the string of pandas DataFrame column with another string by using DataFrame.replace() method. This method updates the specified value with another specified value and returns a new DataFrame. In order to update on existing DataFrame use inplace=True

 # Replace string using DataFrame.replace() method. df2 = df.replace('PySpark','Python with Spark') print(df2) 

Yields below output. This example replaces the string PySpark with Python with Spark .

 # Output: Courses Fee Duration 0 Spark 22000 30days 1 Python with Spark 25000 50days 2 Spark 23000 30days 3 Java Language 24000 60days 4 Python with Spark 26000 35days 5 PHP Language 27000 30days 

3. Replace Multiple Strings

Now let’s see how to replace multiple strings column(s), In this example, I will also show how to replace part of the string by using regex=True param. To update multiple string columns, use the dict with key-value pair. The below example updates Py with Python with on Courses column and days with Days on Duration column.

 # Replace pattern of string using regular expression. df2 = df.replace(, , regex=True) print(df2) 
 # Output: Courses Fee Duration 0 Spark 22000 30 Days 1 Python with Spark 25000 50 Days 2 Spark 23000 30 Days 3 Java Language 24000 60 Days 4 Python with Spark 26000 35 Days 5 PHP Language 27000 30 Days 

4. Replace Pattern of String Using Regular Expression

Using regular expression you can replace the matching string with another string in pandas DataFrame. The below example find string Language and replace it with Lan .

 # Replace pattern of string using regular expression. df2=df.replace(regex=['Language'],value='Lang') print(df2) 
 # Output: Courses Fee Duration 0 Spark 22000 30days 1 PySpark 25000 50days 2 Spark 23000 30days 3 Java Lang 24000 60days 4 PySpark 26000 35days 5 PHP Lang 27000 30days 

5. Using str.replace() on DataFrame

Alternatively, use str.replace() to replace a string, repalce() looks for exact matches unless you pass a regex pattern and param regex=True .

 # By using str.replace() df['Courses'] = df['Courses'].str.replace('Language','Lang') print(df) 

Yields same output as above. Note that this replaces the value on the Courses column on the existing DataFrame object.

6. Replace String Using apply() function with lambda

In this section, you can find out how to replace string using DataFrame.apply() with lambda expression. The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.

 # Replace String using apply() function with lambda. df2 = df.apply(lambda x: x.replace(, regex=True)) print(df2) 
 # Output: Courses Fee Duration 0 Spark 22000 30days 1 Python withSpark 25000 50days 2 Spark 23000 30days 3 Java Lang 24000 60days 4 Python withSpark 26000 35days 5 PHP Lang 27000 30days 

7. Complete Example of Replace String in DataFrame

 # Create a pandas DataFrame. import pandas as pd import numpy as np technologies= < 'Courses':["Spark","PySpark","Spark","P","PySpark","P"], 'Fee' :[22000,25000,23000,24000,26000,27000], 'Duration':['30days','50days','30days','60days','35days','30days'] >df = pd.DataFrame(technologies) print(df) # Replace string using DataFrame.replace() method. df2 = df.replace('Py','Python with ', regex=True) print(df2) # Replace pattern of string using regular expression. df2 = df.replace(, , regex=True) print(df2) # Replace pattern of string using regular expression. df2=df.replace(regex=['Language'],value='Lang') print(df2) # By using str.replace() df['Courses'] = df['Courses'].str.replace('Language','Lang') print(df) # Replace String using apply() function with lambda. df2 = df.apply(lambda x: x.replace(, regex=True)) print(df2) 

8. Conclusion

In this article, You have learned how to replace the string in pandas column by using DataFrame.replace() and str.replace() with lambda function with some examples.

References

You may also like reading:

Источник

pandas.Series.str.replace#

Replace each occurrence of pattern/regex in the Series/Index.

Equivalent to str.replace() or re.sub() , depending on the regex value.

Parameters pat str or compiled regex

String can be a character sequence or regular expression.

repl str or callable

Replacement string or a callable. The callable is passed the regex match object and must return a replacement string to be used. See re.sub() .

n int, default -1 (all)

Number of replacements to make from start.

case bool, default None

Determines if replace is case sensitive:

  • If True, case sensitive (the default if pat is a string)
  • Set to False for case insensitive
  • Cannot be set if pat is a compiled regex.

Regex module flags, e.g. re.IGNORECASE. Cannot be set if pat is a compiled regex.

regex bool, default False

Determines if the passed-in pattern is a regular expression:

  • If True, assumes the passed-in pattern is a regular expression.
  • If False, treats the pattern as a literal string
  • Cannot be set to False if pat is a compiled regex or repl is a callable.

A copy of the object with all matching occurrences of pat replaced by repl .

  • if regex is False and repl is a callable or pat is a compiled regex
  • if pat is a compiled regex and case or flags is set

When pat is a compiled regex, all flags should be included in the compiled regex. Use of case , flags , or regex=False with a compiled regex will raise an error.

When pat is a string and regex is True (the default), the given pat is compiled as a regex. When repl is a string, it replaces matching regex patterns as with re.sub() . NaN value(s) in the Series are left as is:

>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f.', 'ba', regex=True) 0 bao 1 baz 2 NaN dtype: object 

When pat is a string and regex is False, every pat is replaced with repl as with str.replace() :

>>> pd.Series(['f.o', 'fuz', np.nan]).str.replace('f.', 'ba', regex=False) 0 bao 1 fuz 2 NaN dtype: object 

When repl is a callable, it is called on every pat using re.sub() . The callable should expect one positional argument (a regex object) and return a string.

>>> pd.Series(['foo', 'fuz', np.nan]).str.replace('f', repr, regex=True) 0 oo 1 uz 2 NaN dtype: object 

Reverse every lowercase alphabetic word:

>>> repl = lambda m: m.group(0)[::-1] >>> ser = pd.Series(['foo 123', 'bar baz', np.nan]) >>> ser.str.replace(r'[a-z]+', repl, regex=True) 0 oof 123 1 rab zab 2 NaN dtype: object 

Using regex groups (extract second group and swap case):

>>> pat = r"(?P\w+) (?P\w+) (?P\w+)" >>> repl = lambda m: m.group('two').swapcase() >>> ser = pd.Series(['One Two Three', 'Foo Bar Baz']) >>> ser.str.replace(pat, repl, regex=True) 0 tWO 1 bAR dtype: object 

Using a compiled regex with flags

>>> import re >>> regex_pat = re.compile(r'FUZ', flags=re.IGNORECASE) >>> pd.Series(['foo', 'fuz', np.nan]).str.replace(regex_pat, 'bar', regex=True) 0 foo 1 bar 2 NaN dtype: object 

Источник

Читайте также:  Binary to image java
Оцените статью