Python apply с условием

pandas.DataFrame.apply#

Objects passed to the function are Series objects whose index is either the DataFrame’s index ( axis=0 ) or the DataFrame’s columns ( axis=1 ). By default ( result_type=None ), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

Parameters func function

Function to apply to each column or row.

Axis along which the function is applied:

Determines if row or column is passed as a Series or ndarray object:

  • False : passes each row or column as a Series to the function.
  • True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

These only act when axis=1 (columns):

  • ‘expand’ : list-like results will be turned into columns.
  • ‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
  • ‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
Читайте также:  Php ini array limit

The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

args tuple

Positional arguments to pass to func in addition to the array/series.

Additional keyword arguments to pass as keywords arguments to func .

Returns Series or DataFrame

Result of applying func along the given axis of the DataFrame.

For elementwise operations.

Only perform aggregating type operations.

Only perform transforming type operations.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.

>>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B']) >>> df A B 0 4 9 1 4 9 2 4 9 

Using a numpy universal function (in this case the same as np.sqrt(df) ):

>>> df.apply(np.sqrt) A B 0 2.0 3.0 1 2.0 3.0 2 2.0 3.0 

Using a reducing function on either axis

>>> df.apply(np.sum, axis=0) A 12 B 27 dtype: int64 
>>> df.apply(np.sum, axis=1) 0 13 1 13 2 13 dtype: int64 

Returning a list-like will result in a Series

>>> df.apply(lambda x: [1, 2], axis=1) 0 [1, 2] 1 [1, 2] 2 [1, 2] dtype: object 

Passing result_type=’expand’ will expand list-like results to columns of a Dataframe

>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand') 0 1 0 1 2 1 1 2 2 1 2 

Returning a Series inside the function is similar to passing result_type=’expand’ . The resulting column names will be the Series index.

>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1) foo bar 0 1 2 1 1 2 2 1 2 

Passing result_type=’broadcast’ will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.

>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast') A B 0 1 2 1 1 2 2 1 2 

Источник

Pandas: как использовать Apply и Lambda вместе

Вы можете использовать следующий базовый синтаксис для применения лямбда-функции к кадру данных pandas:

df['col'] = df['col'].apply ( lambda x: ' value1 ' if x < 20 else ' value2') 

В следующих примерах показано, как использовать этот синтаксис на практике со следующими пандами DataFrame:

import pandas as pd #create DataFrame df = pd.DataFrame() #view DataFrame print(df) team points assists 0 A 18 5 1 B 22 7 2 C 19 7 3 D 14 9 4 E 14 12 5 F 11 9 6 G 20 9 7 H 28 4 

Пример 1: использование Apply и Lambda для создания нового столбца

В следующем коде показано, как использовать apply и lambda для создания нового столбца, значения которого зависят от значений существующего столбца:

#create new column called 'status' df['status'] = df['points'].apply ( lambda x: ' Bad ' if x < 20 else ' Good') #view updated DataFrame print(df) team points assists status 0 A 18 5 Bad 1 B 22 7 Good 2 C 19 7 Bad 3 D 14 9 Bad 4 E 14 12 Bad 5 F 11 9 Bad 6 G 20 9 Good 7 H 28 4 Good 

В этом примере мы создали новый столбец с именем status , который принимал следующие значения:

  • « Плохо », если значение в столбце очков было меньше 20.
  • « Хорошо », если значение в столбце баллов больше или равно 20.

Пример 2: использование Apply и Lambda для изменения существующего столбца

В следующем коде показано, как использовать apply и lambda для изменения существующего столбца в DataFrame:

#modify existing 'points' column df['points'] = df['points'].apply ( lambda x: x/2 if x < 20 else x\*2) #view updated DataFrame print(df) team points assists 0 A 9.0 5 1 B 44.0 7 2 C 9.5 7 3 D 7.0 9 4 E 7.0 12 5 F 5.5 9 6 G 40.0 9 7 H 56.0 4 

В этом примере мы изменили значения в столбце существующих точек , используя следующее правило в лямбда-функции:

  • Если значение меньше 20, разделите значение на 2.
  • Если значение больше или равно 20, умножьте значение на 2.

Используя эту лямбда-функцию, мы смогли изменить значения в столбце существующих точек .

Дополнительные ресурсы

В следующих руководствах объясняется, как выполнять другие распространенные функции в pandas:

Источник

5 ways to apply an IF condition in Pandas DataFrame

Data to Fish

In this guide, you’ll see 5 different ways to apply an IF condition in Pandas DataFrame.

Specifically, you’ll see how to apply an IF condition for:

  1. Set of numbers
  2. Set of numbers and lambda
  3. Strings
  4. Strings and lambda
  5. OR condition

Applying an IF condition in Pandas DataFrame

Let’s now review the following 5 cases:

(1) IF condition – Set of numbers

Suppose that you created a DataFrame in Python that has 10 numbers (from 1 to 10). You then want to apply the following IF conditions:

  • If the number is equal or lower than 4, then assign the value of ‘True’
  • Otherwise, if the number is greater than 4, then assign the value of ‘False’

This is the general structure that you may use to create the IF condition:

df.loc[df['column name'] condition, 'new column name'] = 'value if condition is met'

For our example, the Python code would look like this:

import pandas as pd data = df = pd.DataFrame(data) df.loc[df['set_of_numbers'] 4, 'equal_or_lower_than_4?'] = 'False' print(df)

Here is the result that you’ll get in Python:

 set_of_numbers equal_or_lower_than_4? 0 1 True 1 2 True 2 3 True 3 4 True 4 5 False 5 6 False 6 7 False 7 8 False 8 9 False 9 10 False 

(2) IF condition – set of numbers and lambda

You’ll now see how to get the same results as in case 1 by using lambda, where the conditions are:

  • If the number is equal or lower than 4, then assign the value of ‘True’
  • Otherwise, if the number is greater than 4, then assign the value of ‘False’

Here is the generic structure that you may apply in Python:

df['new column name'] = df['column name'].apply(lambda x: 'value if condition is met' if x condition else 'value if condition is not met')
import pandas as pd data = df = pd.DataFrame(data) df['equal_or_lower_than_4?'] = df['set_of_numbers'].apply(lambda x: 'True' if x 

This is the result that you’ll get, which matches with case 1:

 set_of_numbers equal_or_lower_than_4? 0 1 True 1 2 True 2 3 True 3 4 True 4 5 False 5 6 False 6 7 False 7 8 False 8 9 False 9 10 False 

(3) IF condition – strings

Now, let’s create a DataFrame that contains only strings/text with 4 names: Jon, Bill, Maria and Emma.

  • If the name is equal to ‘Bill,’ then assign the value of ‘Match’
  • Otherwise, if the name is not ‘Bill,’ then assign the value of ‘Mismatch’
import pandas as pd data = df = pd.DataFrame(data) df.loc[df['first_name'] == 'Bill', 'name_match'] = 'Match' df.loc[df['first_name'] != 'Bill', 'name_match'] = 'Mismatch' print(df)

Once you run the above Python code, you’ll see:

 first_name name_match 0 Jon Mismatch 1 Bill Match 2 Maria Mismatch 3 Emma Mismatch 

(4) IF condition – strings and lambda

You’ll get the same results as in case 3 by using lambda:

import pandas as pd data = df = pd.DataFrame(data) df['name_match'] = df['first_name'].apply(lambda x: 'Match' if x == 'Bill' else 'Mismatch') print(df)

And here is the output from Python:

 first_name name_match 0 Jon Mismatch 1 Bill Match 2 Maria Mismatch 3 Emma Mismatch 

(5) IF condition with OR

Now let’s apply these conditions:

  • If the name is ‘Bill’ or ‘Emma,’ then assign the value of ‘Match’
  • Otherwise, if the name is neither ‘Bill’ nor ‘Emma,’ then assign the value of ‘Mismatch’
import pandas as pd data = df = pd.DataFrame(data) df.loc[(df['first_name'] == 'Bill') | (df['first_name'] == 'Emma'), 'name_match'] = 'Match' df.loc[(df['first_name'] != 'Bill') & (df['first_name'] != 'Emma'), 'name_match'] = 'Mismatch' print(df)

Run the Python code, and you’ll get the following result:

 first_name name_match 0 Jon Mismatch 1 Bill Match 2 Maria Mismatch 3 Emma Match 

Applying an IF condition under an existing DataFrame column

So far you have seen how to apply an IF condition by creating a new column.

Alternatively, you may store the results under an existing DataFrame column.

For example, let’s say that you created a DataFrame that has 12 numbers, where the last two numbers are zeros:

‘set_of_numbers’: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0 , 0 ]

You may then apply the following IF conditions, and then store the results under the existing ‘set_of_numbers’ column:

  • If the number is equal to 0, then change the value to 999
  • If the number is equal to 5, then change the value to 555
import pandas as pd data = df = pd.DataFrame(data) print(df) df.loc[df['set_of_numbers'] == 0, 'set_of_numbers'] = 999 df.loc[df['set_of_numbers'] == 5, 'set_of_numbers'] = 555 print(df)

Here are the before and after results, where the ‘5’ became ‘555’ and the 0’s became ‘999’ under the existing ‘set_of_numbers’ column:

 set_of_numbers 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 0 11 0 
 set_of_numbers 0 1 1 2 2 3 3 4 4 555 5 6 6 7 7 8 8 9 9 10 10 999 11 999 

On another instance, you may have a DataFrame that contains NaN values. You can then apply an IF condition to replace those values with zeros, as in the example below:

import pandas as pd import numpy as np data = df = pd.DataFrame(data) print(df) df.loc[df['set_of_numbers'].isnull(), 'set_of_numbers'] = 0 print(df)

Before you’ll see the NaN values, and after you’ll see the zero values:

 set_of_numbers 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 6.0 6 7.0 7 8.0 8 9.0 9 10.0 10 NaN 11 NaN 
 set_of_numbers 0 1.0 1 2.0 2 3.0 3 4.0 4 5.0 5 6.0 6 7.0 7 8.0 8 9.0 9 10.0 10 0.0 11 0.0 

Conclusion

You just saw how to apply an IF condition in Pandas DataFrame. There are indeed multiple ways to apply such a condition in Python. You can achieve the same results by using either lambda, or just by sticking with Pandas.

At the end, it boils down to working with the method that is best suited to your needs.

Finally, you may want to check the following external source for additional information about Pandas DataFrame.

Источник

pandas.Series.apply#

Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.

Parameters func function

Python function or NumPy ufunc to apply.

convert_dtype bool, default True

Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.

args tuple

Positional arguments passed to func after the series value.

Additional keyword arguments passed to func.

Returns Series or DataFrame

If func returns a Series object the result will be a DataFrame.

For element-wise operations.

Only perform aggregating type operations.

Only perform transforming type operations.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.

Create a series with typical summer temperatures for each city.

>>> s = pd.Series([20, 21, 12], . index=['London', 'New York', 'Helsinki']) >>> s London 20 New York 21 Helsinki 12 dtype: int64 

Square the values by defining a function and passing it as an argument to apply() .

>>> def square(x): . return x ** 2 >>> s.apply(square) London 400 New York 441 Helsinki 144 dtype: int64 

Square the values by passing an anonymous function as an argument to apply() .

>>> s.apply(lambda x: x ** 2) London 400 New York 441 Helsinki 144 dtype: int64 

Define a custom function that needs additional positional arguments and pass these additional arguments using the args keyword.

>>> def subtract_custom_value(x, custom_value): . return x - custom_value 
>>> s.apply(subtract_custom_value, args=(5,)) London 15 New York 16 Helsinki 7 dtype: int64 

Define a custom function that takes keyword arguments and pass these arguments to apply .

>>> def add_custom_values(x, **kwargs): . for month in kwargs: . x += kwargs[month] . return x 
>>> s.apply(add_custom_values, june=30, july=20, august=25) London 95 New York 96 Helsinki 87 dtype: int64 

Use a function from the Numpy library.

>>> s.apply(np.log) London 2.995732 New York 3.044522 Helsinki 2.484907 dtype: float64 

Источник

Оцените статью