- pandas: Get the number of rows, columns, elements (size) of DataFrame
- Get the number of rows, columns, and elements in pandas.DataFrame
- Display the number of rows, columns, etc.: df.info()
- Get the number of rows and columns: df.shape
- Get the number of rows: len(df)
- Get the number of columns: len(df.columns)
- Get the number of elements: df.size
- Notes when setting an index
- Get the number of elements in pandas.Series
- Get the number of elements : len(s) , s.size
- Pandas: Get the Row Number from a Dataframe
- Loading a Sample Pandas Dataframe
- Get Row Numbers that Match a Condition in a Pandas Dataframe
- Get the First Row Number that Matches a Condition in a Pandas Dataframe
- Count the Number of Rows Matching a Condition
- Conclusion
pandas: Get the number of rows, columns, elements (size) of DataFrame
This article explains how to get the number of rows, columns, and total elements (size) in pandas.DataFrame and pandas.Series .
As an example, use Titanic survivor data. It can be downloaded from Kaggle.
import pandas as pd print(pd.__version__) # 2.0.0 df = pd.read_csv('data/src/titanic_train.csv') print(df.head()) # PassengerId Survived Pclass # 0 1 0 3 \ # 1 2 1 1 # 2 3 1 3 # 3 4 1 1 # 4 5 0 3 # # Name Sex Age SibSp # 0 Braund, Mr. Owen Harris male 22.0 1 \ # 1 Cumings, Mrs. John Bradley (Florence Briggs Th. female 38.0 1 # 2 Heikkinen, Miss. Laina female 26.0 0 # 3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 # 4 Allen, Mr. William Henry male 35.0 0 # # Parch Ticket Fare Cabin Embarked # 0 0 A/5 21171 7.2500 NaN S # 1 0 PC 17599 71.2833 C85 C # 2 0 STON/O2. 3101282 7.9250 NaN S # 3 0 113803 53.1000 C123 S # 4 0 373450 8.0500 NaN S
Get the number of rows, columns, and elements in pandas.DataFrame
Display the number of rows, columns, etc.: df.info()
The info() method of pandas.DataFrame displays information such as the number of rows and columns, total memory usage, the data type of each column, and the count of non-NaN elements.
df.info() # # RangeIndex: 891 entries, 0 to 890 # Data columns (total 12 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 PassengerId 891 non-null int64 # 1 Survived 891 non-null int64 # 2 Pclass 891 non-null int64 # 3 Name 891 non-null object # 4 Sex 891 non-null object # 5 Age 714 non-null float64 # 6 SibSp 891 non-null int64 # 7 Parch 891 non-null int64 # 8 Ticket 891 non-null object # 9 Fare 891 non-null float64 # 10 Cabin 204 non-null object # 11 Embarked 889 non-null object # dtypes: float64(2), int64(5), object(5) # memory usage: 83.7+ KB
The result is displayed as standard output; it cannot be directly obtained as a variable or used in calculations.
Get the number of rows and columns: df.shape
The shape attribute of pandas.DataFrame stores the number of rows and columns as a tuple (number of rows, number of columns) .
print(df.shape) # (891, 12) print(df.shape[0]) # 891 print(df.shape[1]) # 12
You can also unpack the tuple and store the values in separate variables.
row, col = df.shape print(row) # 891 print(col) # 12
Get the number of rows: len(df)
The number of rows in pandas.DataFrame can be obtained with the Python built-in function len() .
Get the number of columns: len(df.columns)
The number of columns in pandas.DataFrame can be obtained by applying len() to the columns attribute.
Get the number of elements: df.size
The total number of elements in pandas.DataFrame is stored in the size attribute. This is equal to row_count * column_count .
print(df.size) # 10692 print(df.shape[0] * df.shape[1]) # 10692
Notes when setting an index
When using the set_index() method to set columns of data as an index, these columns are removed from the main data body (the values attribute) and are no longer included in the total column count.
df_multiindex = df.set_index(['Sex', 'Pclass', 'Embarked', 'PassengerId']) print(df_multiindex.shape) # (891, 8) print(len(df_multiindex)) # 891 print(len(df_multiindex.columns)) # 8 print(df_multiindex.size) # 7128
See the following article for set_index() .
Get the number of elements in pandas.Series
For a pandas.Series example, select one column from a pandas.DataFrame .
s = df['PassengerId'] print(s.head()) # 0 1 # 1 2 # 2 3 # 3 4 # 4 5 # Name: PassengerId, dtype: int64
Get the number of elements : len(s) , s.size
Since pandas.Series is one-dimensional, you can get the total number of elements (size) using either len() or the size and shape attributes. Note that the shape attribute is a tuple with one element.
print(len(s)) # 891 print(s.size) # 891 print(s.shape) # (891,) print(type(s.shape)) #
The info() method was also added to pandas.Series in pandas 1.4.
s.info() # # RangeIndex: 891 entries, 0 to 890 # Series name: PassengerId # Non-Null Count Dtype # -------------- ----- # 891 non-null int64 # dtypes: int64(1) # memory usage: 7.1 KB
Pandas: Get the Row Number from a Dataframe
In this tutorial, you’ll learn how to use Pandas to get the row number (or, really, the index number) of a particular row or rows in a dataframe. There may be many times when you want to be able to know the row number of a particular value, and thankfully Pandas makes this quite easy, using the .index() function.
Practically speaking, this returns the index positions of the rows, rather than a row number as you may be familiar with in Excel. Because an index doesn’t really represent a row number, it doesn’t really represent a row number. That being said, Pandas doesn’t provide a true row number, so the index is closest match to this.
By the end of this tutorial, you’ll have learned:
- How to get the row number(s) for rows matching a condition,
- How to get only a single row number, and
- How to count the number of rows matching a particular condition
The Quick Answer: Use .index to Get a Pandas Row Number
Loading a Sample Pandas Dataframe
To follow along with this tutorial, I have provided a sample Pandas Dataframe. If you want to follow along with the tutorial line by line, feel free to copy the code below. The dataframe is deliberately small so that it is easier to follow along with. Let’s get started!
# Loading a Sample Pandas Dataframe import pandas as pd df = pd.DataFrame.from_dict( < 'Name': ['Joan', 'Devi', 'Melissa', 'Dave', 'Nik', 'Kate', 'Evan'], 'Age':[19, 43, 27, 32, 28, 29, 42], 'Gender': ['Female', 'Female', 'Female', 'Male', 'Male', 'Female', 'Male'], 'Education': ['High School', 'College', 'PhD', 'High School', 'College', 'College', 'College'], 'City': ['Atlanta', 'Toronto', 'New York City', 'Madrid', 'Montreal', 'Vancouver', 'Paris'] >) print(df) # Returns: # Name Age Gender Education City # 0 Joan 19 Female High School Atlanta # 1 Devi 43 Female College Toronto # 2 Melissa 27 Female PhD New York City # 3 Dave 32 Male High School Madrid # 4 Nik 28 Male College Montreal # 5 Kate 29 Female College Vancouver # 6 Evan 42 Male College Paris
We can see that when we print the dataframe that we have a dataframe with six rows and five columns. Our columns contain completely unique variables and others that are more categorical.
In the next section, you’ll learn how to get the row numbers that match a condition in a Pandas Dataframe.
Get Row Numbers that Match a Condition in a Pandas Dataframe
In this section, you’ll learn how to use Pandas to get the row number of a row or rows that match a condition in a dataframe.
We can use conditional Pandas filtering (which I cover off in detail in this tutorial) to filter our dataframe and then select the index, or indices, of those rows. Let’s see how we can get the row numbers for all rows containing Males in the Gender column.
# Get the Row numbers matching a condition in a Pandas dataframe row_numbers = df[df['Gender'] == 'Male'].index print(row_numbers) # Returns: # Int64Index([3, 4, 6], dtype='int64')
We can see here that this returns three items: the indices for the rows matching the condition.
Now, let’s see how we can return the row numbers for rows matching multiple conditions. With this, we can use conditional filtering, by passing into multiple conditions. Let’s select rows where the conditions match being both Female and from Toronto:
# Get the Row numbers matching multiple conditions in a Pandas dataframe row_numbers = df[(df['Gender'] == 'Female') & (df['City'] == 'Toronto')].index print(row_numbers) # Returns: # Int64Index([1], dtype='int64')
We can see here that we were able to return the row numbers of a Pandas Dataframe that matches two conditions.
In the next section, you’ll learn how to use Pandas to get the first row number that matches a condition.
Get the First Row Number that Matches a Condition in a Pandas Dataframe
There may be times when you want to get only the first row number that matches a particular condition. This could be, for example, if you know how that only a single row will match this condition.
We say above, that we returned a Int64Index object, which is an indexable object. Because of this, we can easily access the index of the row number. Let’s see how:
# Get the row number of the first row that matches a condition row_numbers = df[df['Name'] == 'Kate'].index[0] print(row_numbers) # Returns: 5
We can see here, that when we index the index object we return just a single row number. This allows us to access and use this index position in different operations. For example, we could then use the row number to modify content within that record or be able to extract it programmatically.
In the next section, you’ll learn how to count the number of rows that match a condition.
Count the Number of Rows Matching a Condition
You may also find yourself in a situation where you need to be able to identify how many rows match a certain condition. This could be a helpful first step, for example, in identifying uniqueness of a row, if you want to make sure only a single row matches a given condition.
When we used the .index method above, we noticed that it returned a list-like object containing our row numbers. Because of this, we can pass this object into the len() function to count how many items exist in the array.
Let’s see how we can repeat an example above and count how many rows match that condition using Pandas:
# Count number of rows matching a condition row_numbers = df[(df['Gender'] == 'Female') & (df['City'] == 'Toronto')].index print(len(row_numbers)) # Returns: 1
We can see that by passing in the index object into the len() function, that we can confirm that only a single item matches our condition. This allows us to check for duplicates based on what we might assume to be a unique key. Otherwise, it may allow us to confirm whether enough rows match a given condition.
Conclusion
In this tutorial, you learned how to use Pandas to get the row numbers of a Pandas Dataframe that match a given condition. You also learned to get the row numbers of a rows that match multiple conditions. Finally, you learned how to use Pandas count the number of rows that match a given condition.
To learn more about the Pandas .index method, check out the official documentation here.