Python dataframe map function

pandas map() Function – Examples

pandas map() function from Series is used to substitute each value in a Series with another value, that may be derived from a function, a dict or a Series . Since DataFrame columns are series, you can use map() to update the column and assign it back to the DataFrame.

pandas Series is a one-dimensional array-like object containing a sequence of values. Each of these values is associated with a label called index. We can create a Series by using an array-like object (e.g., list) or a dictionary.

  • This method defined only in Series and not present in DataFrame.
  • map() accepts dict , Series , or callable
  • You can use this to perform operations on a specific column of a DataFrame as each column in a DataFrame is Series.
  • map() when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
  • Series.map() operate on one element at time
Читайте также:  Python create pdf document

1. Syntax of pandas map()

The following is the syntax of the pandas map() function. This accepts arg and na_action as parameters and returns a Series.

 # Syntax of Series.map() Series.map(arg, na_action=None) 
  • arg – Accepts function, dict, or Series
  • na_action – Accepts ignore, None. Default set to None.

Let’s create a DataFrame and use it with map() function to update the DataFrame column.

 # Create a pandas DataFrame. import pandas as pd import numpy as np technologies= < 'Fee' :[22000,25000,23000,np.NaN,26000], 'Duration':['30days','50days','30days','35days','40days'] >df = pd.DataFrame(technologies) print(df) 
 # Output: Fee Duration 0 22000.0 30days 1 25000.0 50days 2 23000.0 30days 3 NaN 35days 4 26000.0 40days 

2. Series.map() Example

You can only use the Series.map() function with the particular column of a pandas DataFrame. If you are not aware, every column in DataFrame is a Series. For example, df[‘Fee’] returns a Series object. Let’s see how to apply the map function on one of the DataFrame column and assign it back to the DataFrame.

 # Using Lambda Function df['Fee'] = df['Fee'].map(lambda x: x - (x*10/100)) print(df) 

Yields below output. This example substitutes 10% from the Fee column value.

 # Output: Fee Duration 0 19800.0 30days 1 22500.0 50days 2 20700.0 30days 3 NaN 35days 4 23400.0 40days 

You can also apply a function with the lambda as below. This yields the same output as above.

 # Using custom function def fun1(x): return x/100 df['Fee'] = df['Fee'].map(lambda x:fun1(x)) 

3. Handling NaN by using na_action param

The na_action param is used to handle NaN values. The default option for this argument is None , using which the NaN values are passed to the mapping function and may result in incorrect. You can also use ‘ignore’ , where no action is performed.

 # Let's add the currently to the Fee df['Fee'] = df['Fee'].map('<> RS'.format) print(df) 

Yields below output. Notice that the Value for Fee column for index 3 is ‘nan RS’ which doesn’t make sense.

 # Output: Fee Duration 0 198.0 RS 30days 1 225.0 RS 50days 2 207.0 RS 30days 3 nan RS 35days 4 234.0 RS 40days 

Now let’s use the na_action=’ignore’. This ignores the updating when it sees the NaN value.

 # Use na_action param df['Fee'] = df['Fee'].map('<> RS'.format, na_action='ignore') print(df) 
 # Output: Fee Duration 0 198.0 RS 30days 1 225.0 RS 50days 2 207.0 RS 30days 3 NaN 35days 4 234.0 RS 40days 

4. Using map() with Dictionary

Alternatively, you can also use the dictionary as the mapping function.

 # Using Dictionary for mapping dict_map = updateSer = df['Duration'].map(dict_map) df['Duration'] = updateSer print(df) 
 # Output: Fee Duration 0 198.0 RS 35 Days 1 225.0 RS 55 Days 2 207.0 RS 35 Days 3 NaN NaN 4 234.0 RS 45 Days 

5. Complete Example of pandas map() Function

 # Create a pandas DataFrame. import pandas as pd import numpy as np technologies= < 'Fee' :[22000,25000,23000,np.NaN,26000], 'Duration':['30days','50days','30days','35days','40days'] >df = pd.DataFrame(technologies) print(df) # Using Lambda Function df['Fee'] = df['Fee'].map(lambda x: x - (x*10/100)) print(df) # Using custom function def fun1(x): return x/100 ser = df['Fee'].map(lambda x:fun1(x)) print(ser) # Let's add the currently to the Fee df['Fee'] = df['Fee'].map('<> RS'.format) print(df) df['Fee'] = df['Fee'].map('<> RS'.format, na_action='ignore') print(df) # Using Dictionary for mapping dict_map = updateSer = df['Duration'].map(dict_map) df['Duration'] = updateSer print(df) 

Conclusion

In this article, I have explained map() function is from the Series which is used to substitute each value in a Series with another value and returns a Series object, since DataFrame is a collection of Series, you can use the map() function to update the DataFrame.

References

You may also like reading:

Источник

pandas.DataFrame.map#

New in version 2.1.0: DataFrame.applymap was deprecated and renamed to DataFrame.map.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters : func callable

Python function, returns a single value from a single value.

na_action , default None

If ‘ignore’, propagate NaN values, without passing them to func.

Additional keyword arguments to pass as keywords arguments to func .

Apply a function along input axis of DataFrame.

Replace values given in to_replace with value .

Apply a function elementwise on a Series.

>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]]) >>> df 0 1 0 1.000 2.120 1 3.356 4.567 
>>> df.map(lambda x: len(str(x))) 0 1 0 3 4 1 5 5 

Like Series.map, NA values can be ignored:

>>> df_copy = df.copy() >>> df_copy.iloc[0, 0] = pd.NA >>> df_copy.map(lambda x: len(str(x)), na_action='ignore') 0 1 0 NaN 4 1 5.0 5 

Note that a vectorized version of func often exists, which will be much faster. You could square each number elementwise.

>>> df.map(lambda x: x**2) 0 1 0 1.000000 4.494400 1 11.262736 20.857489 

But it’s better to avoid map in that case.

>>> df ** 2 0 1 0 1.000000 4.494400 1 11.262736 20.857489 

Источник

pandas.DataFrame.map#

New in version 2.1.0: DataFrame.applymap was deprecated and renamed to DataFrame.map.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters : func callable

Python function, returns a single value from a single value.

na_action , default None

If ‘ignore’, propagate NaN values, without passing them to func.

Additional keyword arguments to pass as keywords arguments to func .

Apply a function along input axis of DataFrame.

Replace values given in to_replace with value .

Apply a function elementwise on a Series.

>>> df = pd.DataFrame([[1, 2.12], [3.356, 4.567]]) >>> df 0 1 0 1.000 2.120 1 3.356 4.567 
>>> df.map(lambda x: len(str(x))) 0 1 0 3 4 1 5 5 

Like Series.map, NA values can be ignored:

>>> df_copy = df.copy() >>> df_copy.iloc[0, 0] = pd.NA >>> df_copy.map(lambda x: len(str(x)), na_action='ignore') 0 1 0 NaN 4 1 5.0 5 

Note that a vectorized version of func often exists, which will be much faster. You could square each number elementwise.

>>> df.map(lambda x: x**2) 0 1 0 1.000000 4.494400 1 11.262736 20.857489 

But it’s better to avoid map in that case.

>>> df ** 2 0 1 0 1.000000 4.494400 1 11.262736 20.857489 

Источник

How to use the Pandas map() function

Learn how to use the Pandas map() function to create a new dataframe column or series based on a mapping using a dictionary or a custom function.

How to use the Pandas map() function

The Pandas map() function can be used to map the values of a series to another set of values or run a custom function. It runs at the series level, rather than across a whole dataframe, and is a very useful method for engineering new features based on the values of other columns.

In this simple tutorial, we will look at how to use the map() function to map values in a series to another set of values, both using a custom function and using a mapping from a Python dictionary.

Create a Pandas dataframe

To get started, import the Pandas library using the import pandas as pd naming convention, then either create a Pandas dataframe containing some dummy data. We’ll create a tiny dataframe containing the scientific names of some fish species and their lengths.

import pandas as pd df = pd.DataFrame( [('Pterophyllum altum', 'Pterophyllum', 12.5), ('Coptodon snyderae', 'Coptodon', 8.2), ('Astronotus ocellatus', 'Astronotus', 31.2), ('Corydoras aeneus', 'Corydoras', 5.3), ('Xenomystus nigri', 'Xenomystus', 5.3) ], columns=['species', 'genus', 'length_cm'] ) df 
species genus length_cm
0 Pterophyllum altum Pterophyllum 12.5
1 Coptodon snyderae Coptodon 8.2
2 Astronotus ocellatus Astronotus 31.2
3 Corydoras aeneus Corydoras 5.3
4 Xenomystus nigri Xenomystus 5.3

Use map() with a dictionary to map values in a column to new values

First, we’ll look at how to use the map() function to map the values in a Pandas column or series to the values in a Python dictionary. We’ll create a dictionary called mappings that contains the genus as the key and the family as the value. Then we’ll use the map() function to map the values in the genus column to the values in the mappings dictionary and save the results to a new column called family .

mappings =  'Pterophyllum': 'Cichlidae', 'Coptodon': 'Cichlidae', 'Astronotus': 'Cichlidae', 'Corydoras': 'Callichthyidae', > df['family'] = df['genus'].map(mappings) df 

When the map() function finds a match for the column value in the dictionary it will pass the dictionary value back so it’s stored in the new column. If no matching value is found in the dictionary, the map() function returns a NaN value. You can use the Pandas fillna() function to handle any such values present.

species genus length_cm family
0 Pterophyllum altum Pterophyllum 12.5 Cichlidae
1 Coptodon snyderae Coptodon 8.2 Cichlidae
2 Astronotus ocellatus Astronotus 31.2 Cichlidae
3 Corydoras aeneus Corydoras 5.3 Callichthyidae
4 Xenomystus nigri Xenomystus 5.3 NaN

Use map() with a function to map values in a column to new values

The other way to use the Pandas map() function is to map values in a column to new values using a custom function. This allows you to use some more complex logic to select how a Pandas column value is mapped to some other value.

We’ll first create a little custom function called get_size_label() that takes the value from the length_cm column and returns a string label for the size of the fish. We’ll then use the map() function to apply this function to each value in the length_cm column and create a new column called size_label with the size label for each fish.

 def get_size_label(length_cm): if length_cm  10: return 'small' elif length_cm  20: return 'medium' else: return 'large' df['size'] = df['length_cm'].map(get_size_label) df 
species genus length_cm family size
0 Pterophyllum altum Pterophyllum 12.5 Cichlidae medium
1 Coptodon snyderae Coptodon 8.2 Cichlidae small
2 Astronotus ocellatus Astronotus 31.2 Cichlidae large
3 Corydoras aeneus Corydoras 5.3 Callichthyidae small
4 Xenomystus nigri Xenomystus 5.3 NaN small

Matt Clarke, Sunday, January 08, 2023

Источник

Оцените статью