Python pandas new column

Содержание

pandas.DataFrame.insert#
pandas.DataFrame.assign#
How to create new columns derived from existing columns#
REMEMBER

pandas.DataFrame.insert#

Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.

Parameters loc int

Insertion index. Must verify 0

column str, number, or hashable object

Label of the inserted column.

value Scalar, Series, or array-like allow_duplicates bool, optional, default lib.no_default

>>> df = pd.DataFrame('col1': [1, 2], 'col2': [3, 4]>) >>> df col1 col2 0 1 3 1 2 4 >>> df.insert(1, "newcol", [99, 99]) >>> df col1 newcol col2 0 1 99 3 1 2 99 4 >>> df.insert(0, "col1", [100, 100], allow_duplicates=True) >>> df col1 col1 newcol col2 0 100 1 99 3 1 100 2 99 4

Notice that pandas uses index alignment in case of value from type Series :

>>> df.insert(0, "col0", pd.Series([5, 6], index=[1, 2])) >>> df col0 col1 col1 newcol col2 0 NaN 100 1 99 3 1 5.0 100 2 99 4

Источник

pandas.DataFrame.assign#

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

Parameters **kwargs dict of

The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.

A new DataFrame with the new columns in addition to all the existing columns.

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

>>> df = pd.DataFrame('temp_c': [17.0, 25.0]>, . index=['Portland', 'Berkeley']) >>> df temp_c Portland 17.0 Berkeley 25.0

Where the value is a callable, evaluated on df :

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 Berkeley 25.0 77.0

Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:

>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 Berkeley 25.0 77.0

You can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:

>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32, . temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9) temp_c temp_f temp_k Portland 17.0 62.6 290.15 Berkeley 25.0 77.0 298.15

Источник

How to create new columns derived from existing columns#

The calculation of the values is done element-wise. This means all values in the given column are multiplied by the value 1.882 at once. You do not need to use a loop to iterate each of the rows!

I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column.

In [6]: air_quality["ratio_paris_antwerp"] = ( . air_quality["station_paris"] / air_quality["station_antwerp"] . ) . In [7]: air_quality.head() Out[7]: station_antwerp . ratio_paris_antwerp datetime . 2019-05-07 02:00:00 NaN . NaN 2019-05-07 03:00:00 50.5 . 0.495050 2019-05-07 04:00:00 45.0 . 0.615556 2019-05-07 05:00:00 NaN . NaN 2019-05-07 06:00:00 NaN . NaN [5 rows x 5 columns]

Also other mathematical operators ( + , — , * , / ,…) or logical operators ( < , >, == ,…) work element-wise. The latter was already used in the subset data tutorial to filter rows of a table using a conditional expression.

If you need more advanced logic, you can use arbitrary Python code via apply() .

In [8]: air_quality_renamed = air_quality.rename( . columns=  . "station_antwerp": "BETR801", . "station_paris": "FR04014", . "station_london": "London Westminster", . > . ) .

In [9]: air_quality_renamed.head() Out[9]: BETR801 FR04014 . london_mg_per_cubic ratio_paris_antwerp datetime . 2019-05-07 02:00:00 NaN NaN . 43.286 NaN 2019-05-07 03:00:00 50.5 25.0 . 35.758 0.495050 2019-05-07 04:00:00 45.0 27.7 . 35.758 0.615556 2019-05-07 05:00:00 NaN 50.4 . 30.112 NaN 2019-05-07 06:00:00 NaN 61.9 . NaN NaN [5 rows x 5 columns]

The mapping should not be restricted to fixed names only, but can be a mapping function as well. For example, converting the column names to lowercase letters can be done using a function as well:

In [10]: air_quality_renamed = air_quality_renamed.rename(columns=str.lower) In [11]: air_quality_renamed.head() Out[11]: betr801 fr04014 . london_mg_per_cubic ratio_paris_antwerp datetime . 2019-05-07 02:00:00 NaN NaN . 43.286 NaN 2019-05-07 03:00:00 50.5 25.0 . 35.758 0.495050 2019-05-07 04:00:00 45.0 27.7 . 35.758 0.615556 2019-05-07 05:00:00 NaN 50.4 . 30.112 NaN 2019-05-07 06:00:00 NaN 61.9 . NaN NaN [5 rows x 5 columns]

Details about column or row label renaming is provided in the user guide section on renaming labels .

REMEMBER

Create a new column by assigning the output to the DataFrame with a new column name in between the [] .
Operations are element-wise, no need to loop over rows.
Use rename with a dictionary or function to rename row labels or column names.

The user guide contains a separate section on column addition and deletion .

How do I create plots in pandas?

How to calculate summary statistics

Источник