- Python | Pandas DataFrame.fillna() to replace Null values in dataframe
- pandas.DataFrame.fillna#
- Как заменить значения NaN на ноль в Pandas
- Метод 1: заменить значения NaN на ноль в одном столбце
- Способ 2: заменить значения NaN на ноль в нескольких столбцах
- Способ 3: заменить значения NaN на ноль во всех столбцах
- Дополнительные ресурсы
- Replace NaN Values with Zeros in Pandas DataFrame
- 4 cases to replace NaN values with zeros in Pandas DataFrame
- Case 1: replace NaN values with zeros for a column using Pandas
- Case 2: replace NaN values with zeros for a column using NumPy
- Case 3: replace NaN values with zeros for an entire DataFrame using Pandas
- Case 4: replace NaN values with zeros for an entire DataFrame using NumPy
Python | Pandas DataFrame.fillna() to replace Null values in dataframe
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier. Sometimes csv file has null values, which are later displayed as NaN in Data Frame. Just like pandas dropna() method manage and remove Null values from a data frame, fillna() manages and let the user replace NaN values with some value of their own.
Syntax:
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
value : Static, dictionary, array, series or dataframe to fill instead of NaN. method : Method is used if user doesn’t pass any value. Pandas has different methods like bfill, backfill or ffill which fills the place with value in the Forward index or Previous/Back respectively. axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and ‘index’ or ‘columns’ for String inplace: It is a boolean which makes the changes in data frame itself if True. limit : This is an integer value which specifies maximum number of consecutive forward/backward NaN value fills. downcast : It takes a dict which specifies what dtype to downcast to which one. Like Float64 to int64. **kwargs : Any other Keyword arguments
For link to CSV file Used in Code, click here. Example #1: Replacing NaN values with a Static value. Before replacing:
pandas.DataFrame.fillna#
Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.
Method to use for filling holes in reindexed Series:
- ffill: propagate last valid observation forward to next valid.
- backfill / bfill: use next valid observation to fill gap.
Axis along which to fill missing values. For Series this parameter is unused and defaults to 0.
inplace bool, default False
If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame).
limit int, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.
downcast dict, default is None
A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).
Returns DataFrame or None
Object with missing values filled or None if inplace=True .
Fill NaN values using interpolation.
Conform object to new index.
Convert TimeSeries to specified frequency.
>>> df = pd.DataFrame([[np.nan, 2, np.nan, 0], . [3, 4, np.nan, 1], . [np.nan, np.nan, np.nan, np.nan], . [np.nan, 3, np.nan, 4]], . columns=list("ABCD")) >>> df A B C D 0 NaN 2.0 NaN 0.0 1 3.0 4.0 NaN 1.0 2 NaN NaN NaN NaN 3 NaN 3.0 NaN 4.0
Replace all NaN elements with 0s.
>>> df.fillna(0) A B C D 0 0.0 2.0 0.0 0.0 1 3.0 4.0 0.0 1.0 2 0.0 0.0 0.0 0.0 3 0.0 3.0 0.0 4.0
We can also propagate non-null values forward or backward.
>>> df.fillna(method="ffill") A B C D 0 NaN 2.0 NaN 0.0 1 3.0 4.0 NaN 1.0 2 3.0 4.0 NaN 1.0 3 3.0 3.0 NaN 4.0
Replace all NaN elements in column ‘A’, ‘B’, ‘C’, and ‘D’, with 0, 1, 2, and 3 respectively.
>>> values = "A": 0, "B": 1, "C": 2, "D": 3> >>> df.fillna(value=values) A B C D 0 0.0 2.0 2.0 0.0 1 3.0 4.0 2.0 1.0 2 0.0 1.0 2.0 3.0 3 0.0 3.0 2.0 4.0
Only replace the first NaN element.
>>> df.fillna(value=values, limit=1) A B C D 0 0.0 2.0 2.0 0.0 1 3.0 4.0 NaN 1.0 2 NaN 1.0 NaN 3.0 3 NaN 3.0 NaN 4.0
When filling using a DataFrame, replacement happens along the same column names and same indices
>>> df2 = pd.DataFrame(np.zeros((4, 4)), columns=list("ABCE")) >>> df.fillna(df2) A B C D 0 0.0 2.0 0.0 0.0 1 3.0 4.0 0.0 1.0 2 0.0 0.0 0.0 NaN 3 0.0 3.0 0.0 4.0
Note that column D is not affected since it is not present in df2.
Как заменить значения NaN на ноль в Pandas
Вы можете использовать следующие методы для замены значений NaN нулями в кадре данных pandas:
Метод 1: заменить значения NaN на ноль в одном столбце
Способ 2: заменить значения NaN на ноль в нескольких столбцах
df[['col1', 'col2']] = df[['col1', 'col2']]. fillna (0)
Способ 3: заменить значения NaN на ноль во всех столбцах
В следующих примерах показано, как использовать каждый из этих методов со следующими пандами DataFrame:
import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame() #view DataFrame print(df) points assists rebounds 0 25.0 5.0 11.0 1 NaN NaN 8.0 2 15.0 7.0 10.0 3 14.0 NaN 6.0 4 19.0 12.0 6.0 5 23.0 9.0 NaN 6 25.0 9.0 9.0 7 29.0 4.0 NaN
Метод 1: заменить значения NaN на ноль в одном столбце
В следующем коде показано, как заменить значения NaN на ноль только в столбце «помощь»:
#replace NaN values with zero in 'assists' column df['assists'] = df['assists']. fillna (0) #view updated DataFrame print(df) points assists rebounds 0 25.0 5.0 11.0 1 NaN 0.0 8.0 2 15.0 7.0 10.0 3 14.0 0.0 6.0 4 19.0 12.0 6.0 5 23.0 9.0 NaN 6 25.0 9.0 9.0 7 29.0 4.0 NaN
Обратите внимание, что значения NaN в столбце «ассисты» были заменены нулями, но значения NaN во всех остальных столбцах остались прежними.
Способ 2: заменить значения NaN на ноль в нескольких столбцах
В следующем коде показано, как заменить значения NaN на ноль в столбцах «баллы» и «ассисты»:
#replace NaN values with zero in 'points' and 'assists' column df[['points', 'assists']] = df[['points', 'assists']]. fillna (0) #view updated DataFrame print(df) points assists rebounds 0 25.0 5.0 11.0 1 0.0 0.0 8.0 2 15.0 7.0 10.0 3 14.0 0.0 6.0 4 19.0 12.0 6.0 5 23.0 9.0 NaN 6 25.0 9.0 9.0 7 29.0 4.0 NaN
Способ 3: заменить значения NaN на ноль во всех столбцах
В следующем коде показано, как заменить значения NaN нулем в каждом столбце DataFrame:
#replace NaN values with zero in all columns df = df.fillna(0) #view updated DataFrame print(df) points assists rebounds 0 25.0 5.0 11.0 1 0.0 0.0 8.0 2 15.0 7.0 10.0 3 14.0 0.0 6.0 4 19.0 12.0 6.0 5 23.0 9.0 0.0 6 25.0 9.0 9.0 7 29.0 4.0 0.0
Дополнительные ресурсы
В следующих руководствах объясняется, как выполнять другие распространенные операции в pandas:
Replace NaN Values with Zeros in Pandas DataFrame
Depending on the scenario, you may use either of the 4 approaches below in order to replace NaN values with zeros in Pandas DataFrame:
(1) For a single column using Pandas:
df['DataFrame Column'] = df['DataFrame Column'].fillna(0)
(2) For a single column using NumPy:
df['DataFrame Column'] = df['DataFrame Column'].replace(np.nan, 0)
(3) For an entire DataFrame using Pandas:
(4) For an entire DataFrame using NumPy:
Let’s now review how to apply each of the 4 cases using simple examples.
4 cases to replace NaN values with zeros in Pandas DataFrame
Case 1: replace NaN values with zeros for a column using Pandas
Suppose that you have a single column with the following data that contains NaN values:
values |
700 |
NaN |
500 |
NaN |
You can then create a DataFrame in Python to capture that data:
import pandas as pd import numpy as np df = pd.DataFrame() print (df)
Run the code in Python, and you’ll get the following DataFrame with the NaN values:
values 0 700.0 1 NaN 2 500.0 3 NaN
In order to replace the NaN values with zeros for a column using Pandas, you may use the first approach introduced at the top of this guide:
df['DataFrame Column'] = df['DataFrame Column'].fillna(0)
In the context of our example, here is the complete Python code to replace the NaN values with 0’s:
import pandas as pd import numpy as np df = pd.DataFrame() df['values'] = df['values'].fillna(0) print (df)
Run the code, and you’ll see that the previous two NaN values became 0’s:
values 0 700.0 1 0.0 2 500.0 3 0.0
Case 2: replace NaN values with zeros for a column using NumPy
You can accomplish the same task, of replacing the NaN values with zeros, by using NumPy:
df['DataFrame Column'] = df['DataFrame Column'].replace(np.nan, 0)
For our example, you can use the following code to perform the replacement:
import pandas as pd import numpy as np df = pd.DataFrame() df['values'] = df['values'].replace(np.nan, 0) print (df)
As before, the two NaN values became 0’s:
values 0 700.0 1 0.0 2 500.0 3 0.0
Case 3: replace NaN values with zeros for an entire DataFrame using Pandas
For the first two cases, you only had a single column in the dataset. But what if your DataFrame contains multiple columns?
For simplicity, let’s assume that you have the following dataset with 2 columns that contain NaN values:
values_1 | values_2 |
700 | NaN |
NaN | 150 |
500 | NaN |
NaN | 400 |
You can then create the DataFrame as follows:
import pandas as pd import numpy as np df = pd.DataFrame() print (df)
Run the code, and you’ll get the DataFrame with the two columns that include the NaNs:
values_1 values_2 0 700.0 NaN 1 NaN 150.0 2 500.0 NaN 3 NaN 400.0
In order to replace the NaN values with zeros for the entire DataFrame using Pandas, you may use the third approach:
import pandas as pd import numpy as np df = pd.DataFrame() df = df.fillna(0) print (df)
You’ll now get 0’s, instead of all the NaNs, across the entire DataFrame:
values_1 values_2 0 700.0 0.0 1 0.0 150.0 2 500.0 0.0 3 0.0 400.0
Case 4: replace NaN values with zeros for an entire DataFrame using NumPy
You can achieve the same goal for an entire DataFrame using NumPy:
And for our example, you can apply the code below to replace the NaN values with zeros:
import pandas as pd import numpy as np df = pd.DataFrame() df= df.replace(np.nan,0) print (df)
Run the code, and you’ll get the same results as in the previous case:
values_1 values_2 0 700.0 0.0 1 0.0 150.0 2 500.0 0.0 3 0.0 400.0
You can find additional information about replacing values in Pandas DataFrame by visiting the Pandas documentation.
Alternatively, you may check this guide for the steps to drop rows with NaN values in Pandas DataFrame.