- Pandas loc против iloc: в чем разница?
- Пример 1: Как использовать loc в Pandas
- Пример 2: Как использовать iloc в Pandas
- Дополнительные ресурсы
- Pandas iloc and loc – quickly select rows and columns in DataFrames
- Selection Options
- Data Setup
- Selection and Indexing Methods for Pandas DataFrames
- 1. Pandas iloc data selection
- 2. Pandas loc data selection
- 2a. Label-based / Index-based indexing using .loc
- 3. Selecting pandas data using ix
- Setting values in DataFrames using .loc
Pandas loc против iloc: в чем разница?
Когда дело доходит до выбора строк и столбцов кадра данных pandas, loc и iloc — это две часто используемые функции.
Вот тонкая разница между двумя функциями:
- loc выбирает строки и столбцы с определенными метками
- iloc выбирает строки и столбцы в определенных целочисленных позициях
В следующих примерах показано, как использовать каждую функцию на практике.
Пример 1: Как использовать loc в Pandas
Предположим, у нас есть следующие Pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame(, index=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) #view DataFrame df team points assists A A 5 11 B A 7 8 C A 7 10 D A 9 6 E B 12 6 F B 9 5 G B 9 9 H B 4 12
Мы можем использовать loc для выбора определенных строк DataFrame на основе их индексных меток:
#select rows with index labels 'E' and 'F' df.loc[['E', 'F']] team points assists E B 12 6 F B 9 5
Мы можем использовать loc для выбора определенных строк и определенных столбцов DataFrame на основе их меток:
#select 'E' and 'F' rows and 'team' and 'assists' columns df.loc[['E', 'F'], ['team', 'assists']] team assists E B 12 F B 9
Мы можем использовать loc с аргументом : для выбора диапазонов строк и столбцов на основе их меток:
#select 'E' and 'F' rows and 'team' and 'assists' columns df.loc['E ': , :' assists'] team points assists E B 12 6 F B 9 5 G B 9 9 H B 4 12
Пример 2: Как использовать iloc в Pandas
Предположим, у нас есть следующие Pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame(, index=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']) #view DataFrame df team points assists A A 5 11 B A 7 8 C A 7 10 D A 9 6 E B 12 6 F B 9 5 G B 9 9 H B 4 12
Мы можем использовать iloc для выбора определенных строк DataFrame на основе их целочисленной позиции:
#select rows in index positions 4 through 6 (not including 6) df.iloc [4:6] team points assists E B 12 6 F B 9 5
Мы можем использовать iloc для выбора определенных строк и определенных столбцов DataFrame на основе их позиций в индексе:
#select rows in range 4 through 6 and columns in range 0 through 2 df.iloc [4:6, 0:2] team assists E B 12 F B 9
Мы можем использовать loc с аргументом : для выбора диапазонов строк и столбцов на основе их меток:
#select rows from 4 through end of rows and columns up to third column df.iloc [4: , :3] team points assists E B 12 6 F B 9 5 G B 9 9 H B 4 12
Дополнительные ресурсы
В следующих руководствах объясняется, как выполнять другие распространенные операции в pandas:
Pandas iloc and loc – quickly select rows and columns in DataFrames
There are multiple ways to select and index rows and columns from Pandas DataFrames. I find tutorials online focusing on advanced selections of row and column choices a little complex for my requirements, but mastering the Pandas iloc, loc, and ix selectors can actually be made quite simple.
Selection Options
There’s three main options to achieve the selection and indexing activities in Pandas, which can be confusing. The three selection cases and methods covered in this post are:
Data Setup
This blog post, inspired by other tutorials, describes selection activities with these operations. The tutorial is suited for the general data science situation where, typically I find myself:
- Each row in your data frame represents a data sample.
- Each column is a variable, and is usually named. I rarely select columns without their names.
- I need to quickly and often select relevant rows from the data frame for modelling and visualisation activities.
For the uninitiated, the Pandas library for Python provides high-performance, easy-to-use data structures and data analysis tools for handling tabular data in “series” and in “data frames”. It’s brilliant at making your data processing easier and I’ve written before about grouping and summarising data with Pandas.
Selection and Indexing Methods for Pandas DataFrames
For these explorations we’ll need some sample data – I downloaded the uk-500 sample data set from www.briandunning.com. This data contains artificial names, addresses, companies and phone numbers for fictitious UK characters. To follow along, you can download the .csv file here. Load the data as follows (the diagrams here come from a Jupyter notebook in the Anaconda Python install):
1. Pandas iloc data selection
The iloc indexer for Pandas Dataframe is used for integer-location based indexing / selection by position.
The iloc indexer syntax is data.iloc[, ], which is sure to be a source of confusion for R users. “iloc” in pandas is used to select rows and columns by number, in the order that they appear in the data frame. You can imagine that each row has a row number from 0 to the total rows (data.shape[0]) and iloc[] allows selections based on these numbers. The same applies for columns (ranging from 0 to data.shape[1] )
There are two “arguments” to iloc – a row selector, and a column selector. For example:
In practice, I rarely use the iloc indexer, unless I want the first ( .iloc[0] ) or the last ( .iloc[-1] ) row of the data frame.
2. Pandas loc data selection
The Pandas loc indexer can be used with DataFrames for two different use cases:
The loc indexer is used with the same syntax as iloc: data.loc[, ] .
2a. Label-based / Index-based indexing using .loc
Selections using the loc method are based on the index of the data frame (if any). Where the index is set on a DataFrame, using df.set_index()
, the .loc method directly selects based on index values of any rows. For example, setting the index of our test data frame to the persons “last_name”:
Now with the index set, we can directly select rows for different “last_name” values using .loc[] – either singly, or in multiples. For example:
Select columns with .loc using the names of the columns. In most of my data work, typically I have named columns, and use these named selections.
You can select ranges of index labels – the selection data.loc[‘Bruch’:’Julio’] will return all rows in the data frame between the index entries for “Bruch” and “Julio”. The following examples should now make sense:
As before, a second argument can be passed to .loc to select particular columns out of the data frame. Again, columns are referred to by name for the loc indexer and can be a single string, a list of columns, or a slice “:” operation.
Note that when selecting columns, if one column only is selected, the .loc operator returns a Series. For a single column DataFrame, use a one-element list to keep the DataFrame format, for example:
Make sure you understand the following additional examples of .loc selections for clarity:
Logical selections and boolean Series can also be passed to the generic [] indexer of a pandas DataFrame and will give the same results: data.loc[data[‘id’] == 9] == data[data[‘id’] == 9] .
3. Selecting pandas data using ix
Note: The ix indexer has been deprecated in recent versions of Pandas, starting with version 0.20.1.
The ix[] indexer is a hybrid of .loc and .iloc. Generally, ix is label based and acts just as the .loc indexer. However, .ix also supports integer type selections (as in .iloc) where passed an integer. This only works where the index of the DataFrame is not integer based. ix will accept any of the inputs of .loc and .iloc.
Slightly more complex, I prefer to explicitly use .iloc and .loc to avoid unexpected results.
Setting values in DataFrames using .loc
With a slight change of syntax, you can actually update your DataFrame in the same statement as you select and filter using .loc indexer. This particular pattern allows you to update values in columns depending on different conditions. The setting operation does not make a copy of the data frame, but edits the original data.
That’s the basics of indexing and selecting with Pandas. If you’re looking for more, take a look at the .iat, and .at operations for some more performance-enhanced value accessors in the Pandas Documentation and take a look at selecting by callable functions for more iloc and loc fun.