- Функция Pandas DataFrame.apply() и параметры
- Синтаксис
- Параметры
- Returns
- Пример
- pandas.Series.apply#
- pandas.DataFrame.apply#
- Pandas DataFrame apply() Examples
- Pandas DataFrame apply() Examples
- 1. Applying a Function to DataFrame Elements
- 2. apply() with lambda
- 3. apply() along axis
- 4. DataFrame apply() with arguments
- 5. DataFrame apply() with positional and keyword arguments
- DataFrame applymap() function
- References
- Still looking for an answer?
Функция Pandas DataFrame.apply() и параметры
Функция Pandas DataFrame.apply() позволяет пользователю передать функцию и применить ее к каждому отдельному значению серии Pandas. Эта функция улучшает возможности библиотеки Pandas, поскольку помогает разделять данные в соответствии с требуемыми условиями. Так что ее можно эффективно использовать для науки о данных и машинного обучения.
Объекты, которые должны быть переданы в функцию, представляют собой объекты Series, индекс которых является либо индексом DataFrame, т. е. ось = 0, либо столбцами DataFrame, т. е. ось = 1.
По умолчанию result_type=None, а окончательный тип возвращаемого значения выводится из типа возвращаемого значения примененной функции. В противном случае это зависит от аргумента result_type.
Синтаксис
DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)
Параметры
- func: это функция, которая применяется к каждому столбцу или строке.
- axis: , значение по умолчанию 0. Это ось, вдоль которой применяется функция. Может иметь два значения:
- 0 или «index»: функция применяется к каждому из столбцов.
- 1 или «columns»: функция применяется к каждой из строк.
Returns
Он возвращает результат применения func по заданной оси DataFrame.
Пример
info = pd.DataFrame([[2, 7]] * 4, columns=['P', 'Q']) info.apply(np.sqrt) info.apply(np.sum, axis=0) info.apply(np.sum, axis=1) info.apply(lambda x: [1, 2], axis=1) info.apply(lambda x: [1, 2], axis=1, result_type='expand') info.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1) info.apply(lambda x: [1, 2], axis=1, result_type='broadcast') info
A B 0 2 7 1 2 7 2 2 7 3 2 7
pandas.Series.apply#
Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.
Parameters func function
Python function or NumPy ufunc to apply.
convert_dtype bool, default True
Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.
args tuple
Positional arguments passed to func after the series value.
Additional keyword arguments passed to func.
Returns Series or DataFrame
If func returns a Series object the result will be a DataFrame.
For element-wise operations.
Only perform aggregating type operations.
Only perform transforming type operations.
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
Create a series with typical summer temperatures for each city.
>>> s = pd.Series([20, 21, 12], . index=['London', 'New York', 'Helsinki']) >>> s London 20 New York 21 Helsinki 12 dtype: int64
Square the values by defining a function and passing it as an argument to apply() .
>>> def square(x): . return x ** 2 >>> s.apply(square) London 400 New York 441 Helsinki 144 dtype: int64
Square the values by passing an anonymous function as an argument to apply() .
>>> s.apply(lambda x: x ** 2) London 400 New York 441 Helsinki 144 dtype: int64
Define a custom function that needs additional positional arguments and pass these additional arguments using the args keyword.
>>> def subtract_custom_value(x, custom_value): . return x - custom_value
>>> s.apply(subtract_custom_value, args=(5,)) London 15 New York 16 Helsinki 7 dtype: int64
Define a custom function that takes keyword arguments and pass these arguments to apply .
>>> def add_custom_values(x, **kwargs): . for month in kwargs: . x += kwargs[month] . return x
>>> s.apply(add_custom_values, june=30, july=20, august=25) London 95 New York 96 Helsinki 87 dtype: int64
Use a function from the Numpy library.
>>> s.apply(np.log) London 2.995732 New York 3.044522 Helsinki 2.484907 dtype: float64
pandas.DataFrame.apply#
Objects passed to the function are Series objects whose index is either the DataFrame’s index ( axis=0 ) or the DataFrame’s columns ( axis=1 ). By default ( result_type=None ), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.
Parameters func function
Function to apply to each column or row.
Axis along which the function is applied:
Determines if row or column is passed as a Series or ndarray object:
- False : passes each row or column as a Series to the function.
- True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.
These only act when axis=1 (columns):
- ‘expand’ : list-like results will be turned into columns.
- ‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
- ‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.
The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.
args tuple
Positional arguments to pass to func in addition to the array/series.
Additional keyword arguments to pass as keywords arguments to func .
Returns Series or DataFrame
Result of applying func along the given axis of the DataFrame.
For elementwise operations.
Only perform aggregating type operations.
Only perform transforming type operations.
Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.
>>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B']) >>> df A B 0 4 9 1 4 9 2 4 9
Using a numpy universal function (in this case the same as np.sqrt(df) ):
>>> df.apply(np.sqrt) A B 0 2.0 3.0 1 2.0 3.0 2 2.0 3.0
Using a reducing function on either axis
>>> df.apply(np.sum, axis=0) A 12 B 27 dtype: int64
>>> df.apply(np.sum, axis=1) 0 13 1 13 2 13 dtype: int64
Returning a list-like will result in a Series
>>> df.apply(lambda x: [1, 2], axis=1) 0 [1, 2] 1 [1, 2] 2 [1, 2] dtype: object
Passing result_type=’expand’ will expand list-like results to columns of a Dataframe
>>> df.apply(lambda x: [1, 2], axis=1, result_type='expand') 0 1 0 1 2 1 1 2 2 1 2
Returning a Series inside the function is similar to passing result_type=’expand’ . The resulting column names will be the Series index.
>>> df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1) foo bar 0 1 2 1 1 2 2 1 2
Passing result_type=’broadcast’ will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.
>>> df.apply(lambda x: [1, 2], axis=1, result_type='broadcast') A B 0 1 2 1 1 2 2 1 2
Pandas DataFrame apply() Examples
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Pandas DataFrame apply() function is used to apply a function along an axis of the DataFrame. The function syntax is:
def apply( self, func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds )
- func: The function to apply to each row or column of the DataFrame.
- axis: axis along which the function is applied. The possible values are , default 0.
- args: The positional arguments to pass to the function. This is helpful when we have to pass additional arguments to the function.
- kwargs: additional keyword arguments to pass to the function. This is helpful when we have to pass additional keyword arguments to the function.
Pandas DataFrame apply() Examples
Let’s look at some examples of using apply() function on a DataFrame object.
1. Applying a Function to DataFrame Elements
import pandas as pd df = pd.DataFrame() def square(x): return x * x df1 = df.apply(square) print(df) print(df1)
A B 0 1 10 1 2 20 A B 0 1 100 1 4 400
The DataFrame on which apply() function is called remains unchanged. The apply() function returns a new DataFrame object after applying the function to its elements.
2. apply() with lambda
If you look at the above example, our square() function is very simple. We can easily convert it into a lambda function. We can create a lambda function while calling the apply() function.
df1 = df.apply(lambda x: x * x)
The output will remain the same as the last example.
3. apply() along axis
We can apply a function along the axis. But, in the last example, there is no use of the axis. The function is being applied to all the elements of the DataFrame. The use of axis becomes clear when we call an aggregate function on the DataFrame rows or columns. Let’s say we want to get the sum of elements along the columns or indexes. The output will be different based on the value of the axis argument.
import pandas as pd import numpy as np df = pd.DataFrame() df1 = df.apply(np.sum, axis=0) print(df1) df1 = df.apply(np.sum, axis=1) print(df1)
A 3 B 30 dtype: int64 0 11 1 22 dtype: int64
In the first example, the sum of elements along the column is calculated. Whereas in the second example, the sum of the elements along the row is calculated.
4. DataFrame apply() with arguments
Let’s say we want to apply a function that accepts more than one parameter. In that case, we can pass the additional parameters using the ‘args’ argument.
import pandas as pd def sum(x, y, z): return x + y + z df = pd.DataFrame() df1 = df.apply(sum, args=(1, 2)) print(df1)
5. DataFrame apply() with positional and keyword arguments
Let’s look at an example where we will use both ‘args’ and ‘kwargs’ parameters to pass positional and keyword arguments to the function.
import pandas as pd def sum(x, y, z, m): return (x + y + z) * m df = pd.DataFrame() df1 = df.apply(sum, args=(1, 2), m=10) print(df1)
DataFrame applymap() function
If you want to apply a function element-wise, you can use applymap() function. This function doesn’t have additional arguments. The function is applied to each of the element and the returned value is used to create the result DataFrame object.
import pandas as pd import math df = pd.DataFrame() df1 = df.applymap(math.sqrt) print(df) print(df1)
A B 0 1 100 1 4 400 A B 0 1.0 10.0 1 2.0 20.0
Let’s look at another example where we will use applymap() function to convert all the elements values to uppercase.
import pandas as pd df = pd.DataFrame() df1 = df.applymap(str.upper) print(df) print(df1)
Name Role 0 Pankaj ceo 1 Meghna cto Name Role 0 PANKAJ CEO 1 MEGHNA CTO
References
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Still looking for an answer?
Hi, I have one problem in which two columns have 10 values and all are same assume 890 in one column and 689 in another and i have 3rd column where values are like this =>value = [23, 45, 67, 89, 90, 234, 1098, 4567] i want another column in which i have to add the value of third column and first compare it to 2nd column if it equals i have to stop adding for that column and then take next column i have to add values of 3rd column till its value equal to other column and collect its corresponding date where the sum has stopped since i will have one more column which contains a different date. 3980 0 2021-04-12 00:00:00 9.4 3980 0 2021-04-13 00:00:00 9.4 3980 0 2021-04-12 00:00:00 9.8 3980 0 2021-04-13 00:00:00 9.8 3980 0 2021-03-01 00:00:00 760 3980 0 2021-03-02 00:00:00 1630 3980 0 2021-03-03 00:00:00 1150 3980 0 2021-03-04 00:00:00 1000 3980 0 2021-03-05 00:00:00 20 3980 0 2021-03-08 00:00:00 210 3980 0 2021-03-09 00:00:00 340 3980 0 2021-03-10 00:00:00 150 3980 0 2021-03-11 00:00:00 160 3980 0 2021-03-12 00:00:00 50 3980 0 2021-03-15 00:00:00 10 3980 0 2021-03-16 00:00:00 350 3980 0 2021-03-17 00:00:00 200 3980 0 2021-03-18 00:00:00 50 If you find any solution please mail me — swetha
Thank you as I have been searching for ways to apply functions to pandas df as the current data is in insufficient! — Carolyn