- Как преобразовать столбцы Pandas DataFrame в строки
- Пример 1: преобразование одного столбца DataFrame в строку
- Пример 2. Преобразование нескольких столбцов DataFrame в строки
- Пример 3: преобразование всего фрейма данных в строки
- pandas.DataFrame.to_string#
- Pandas: Convert Column Values to Strings
- Loading a Sample Dataframe
- What is the String Datatype in Pandas?
- Convert a Pandas Dataframe Column Values to String using astype
- Convert a Pandas Dataframe Column Values to String using map
- Convert a Pandas Dataframe Column Values to String using apply
- Convert a Pandas Dataframe Column Values to String using values.astype
- Convert All Pandas Dataframe Columns to String Using Applymap
- Conclusion
Как преобразовать столбцы Pandas DataFrame в строки
Часто вы можете захотеть преобразовать один или несколько столбцов в кадре данных pandas в строки. К счастью, это легко сделать с помощью встроенной функции pandas astype(str) .
В этом руководстве показано несколько примеров использования этой функции.
Пример 1: преобразование одного столбца DataFrame в строку
Предположим, у нас есть следующие Pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame() #view DataFrame df player points assists 0 A 25 5 1 B 20 7 2 C 14 7 3 D 16 8 4 E 27 11
Мы можем определить тип данных каждого столбца с помощью dtypes:
df.dtypes player object points int64 assists int64 dtype: object
Мы видим, что столбец «игрок» представляет собой строку, а два других столбца «очки» и «ассисты» — целые числа.
Мы можем преобразовать столбец «точки» в строку, просто используя astype(str) следующим образом:
df['points'] = df['points'].astype( str )
Мы можем убедиться, что этот столбец теперь является строкой, еще раз используя dtypes:
df.dtypes player object points object assists int64 dtype: object
Пример 2. Преобразование нескольких столбцов DataFrame в строки
Мы можем преобразовать оба столбца «точки» и «ассисты» в строки, используя следующий синтаксис:
df[['points', 'assists']] = df[['points', 'assists']].astype( str )
И еще раз мы можем проверить, что это строки, используя dtypes:
df.dtypes player object points object assists object dtype: object
Пример 3: преобразование всего фрейма данных в строки
Наконец, мы можем преобразовать каждый столбец в DataFrame в строки, используя следующий синтаксис:
#convert every column to strings df = df.astype(str) #check data type of each column df.dtypes player object points object assists object dtype: object
Вы можете найти полную документацию по функции astype() здесь .
pandas.DataFrame.to_string#
DataFrame. to_string ( buf = None , columns = None , col_space = None , header = True , index = True , na_rep = ‘NaN’ , formatters = None , float_format = None , sparsify = None , index_names = True , justify = None , max_rows = None , max_cols = None , show_dimensions = False , decimal = ‘.’ , line_width = None , min_rows = None , max_colwidth = None , encoding = None ) [source] #
Render a DataFrame to a console-friendly tabular output.
Parameters buf str, Path or StringIO-like, optional, default None
Buffer to write to. If None, the output is returned as a string.
columns sequence, optional, default None
The subset of columns to write. Writes all columns by default.
col_space int, list or dict of int, optional
The minimum width of each column. If a list of ints is given every integers corresponds with one column. If a dict is given, the key references the column, while the value defines the space to use..
header bool or sequence of str, optional
Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.
index bool, optional, default True
Whether to print index (row) labels.
na_rep str, optional, default ‘NaN’
String representation of NaN to use.
formatters list, tuple or dict of one-param. functions, optional
Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.
float_format one-parameter function, optional, default None
Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non- NaN elements, with NaN being handled by na_rep .
Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.
index_names bool, optional, default True
Prints the names of the indexes.
justify str, default None
How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are
- left
- right
- center
- justify
- justify-all
- start
- end
- inherit
- match-parent
- initial
- unset.
Maximum number of rows to display in the console.
max_cols int, optional
Maximum number of columns to display in the console.
show_dimensions bool, default False
Display DataFrame dimensions (number of rows by number of columns).
decimal str, default ‘.’
Character recognized as decimal separator, e.g. ‘,’ in Europe.
line_width int, optional
Width to wrap a line in characters.
min_rows int, optional
The number of rows to display in the console in a truncated repr (when number of rows is above max_rows ).
max_colwidth int, optional
Max width to truncate each column in characters. By default, no limit.
encoding str, default “utf-8”
If buf is None, returns the result as a string. Otherwise returns None.
Convert DataFrame to HTML.
>>> d = 'col1': [1, 2, 3], 'col2': [4, 5, 6]> >>> df = pd.DataFrame(d) >>> print(df.to_string()) col1 col2 0 1 4 1 2 5 2 3 6
Pandas: Convert Column Values to Strings
In this tutorial, you’ll learn how to use Python’s Pandas library to convert a column’s values to a string data type. You will learn how to convert Pandas integers and floats into strings. You’ll also learn how strings have evolved in Pandas, and the advantages of using the Pandas string dtype. You’ll learn four different ways to convert a Pandas column to strings and how to convert every Pandas dataframe column to a string.
The Quick Answer: Use pd.astype(‘string’)
Loading a Sample Dataframe
In order to follow along with the tutorial, feel free to load the same dataframe provided below. We’ll load a dataframe that contains three different columns: 1 of which will load as a string and 2 that will load as integers.
We’ll first load the dataframe, then print its first five records using the .head() method.
import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) print('df head:') print(df.head())
This returns the following information:
# df head: # Name Age Income # 0 Nik 30 70000 # 1 Jane 31 72000 # 2 Matt 29 83000 # 3 Kate 33 90000 # 4 Clark 43 870000
Let’s start the tutorial off by learning a little bit about how Pandas handles string data.
What is the String Datatype in Pandas?
To explore how Pandas handles string data, we can use the .info() method, which will print out information on the dataframe, including the datatypes for each column.
Let’s take a look at what the data types are:
print(df.info()) # Returns: # # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null int64 # 2 Income 5 non-null int64 # dtypes: int64(2), object(1) # memory usage: 248.0+ bytes
We can see here that by default, Pandas will store strings using the object datatype. The object data type is used for strings and for mixed data types, but it’s not particularly explicit.
Beginning in version 1.0, Pandas has had a dedicated string datatype. While this datatype currently doesn’t offer any explicit memory or speed improvements, the development team behind Pandas has indicated that this will occur in the future.
Because of this, the tutorial will use the string datatype throughout the tutorial. If you’re using a version lower than 1.0, please replace string with str in all instances.
Let’s get started by using the preferred method for using Pandas to convert a column to a string.
Convert a Pandas Dataframe Column Values to String using astype
Pandas comes with a column (series) method, .astype() , which allows us to re-cast a column into a different data type.
Many tutorials you’ll find only will tell you to pass in ‘str’ as the argument. While this holds true for versions of Pandas lower than 1.0, if you’re using 1.0 or later, pass in ‘string’ instead.
Doing this will ensure that you are using the string datatype, rather than the object datatype. This will ensure significant improvements in the future.
Let’s take a look at how we can convert a Pandas column to strings, using the .astype() method:
df['Age'] = df['Age'].astype('string') print(df.info())
This returns the following:
# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null string # 2 Income 5 non-null int64 # dtypes: int64(1), object(1), string(1) # memory usage: 248.0+ bytes
We can see that our Age column, which was previously stored as int64 is now stored as the string datatype.
In the next section, you’ll learn how to use the .map() method to convert a Pandas column values to strings.
Convert a Pandas Dataframe Column Values to String using map
Similar to the .astype() Pandas series method, you can use the .map() method to convert a Pandas column to strings.
Let’s take a look at what this looks like:
import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) df['Age'] = df['Age'].map(str) print(df.info())
This returns the following:
# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null object # 2 Income 5 non-null int64 # dtypes: int64(1), object(2) # memory usage: 248.0+ bytes
We can see here that by using the .map() method, we can’t actually use the string datatype. Because of this, the data are saved in the object datatype. Because of this, I would not recommend this approach if you’re using a version higher than 1.0.
In the next section, you’ll learn how to use the .apply() method to convert a Pandas column’s data to strings.
Convert a Pandas Dataframe Column Values to String using apply
Similar to the method above, we can also use the .apply() method to convert a Pandas column values to strings. This comes with the same limitations, in that we cannot convert them to string datatypes, but rather only the object datatype.
Let’s see what this looks like:
import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) df['Age'] = df['Age'].apply(str) print(df.info())
This returns the following:
# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null object # 2 Income 5 non-null int64 # dtypes: int64(1), object(2) # memory usage: 248.0+ bytes
In the next section, you’ll learn how to use the value.astype() method to convert a dataframe column’s values to strings.
Convert a Pandas Dataframe Column Values to String using values.astype
Finally, we can also use the .values.astype() method to directly convert a column’s values into strings using Pandas.
Let’s see what this looks like:
import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) df['Age'] = df['Age'].values.astype(str) print(df.info())
This returns the following:
# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null object # 2 Income 5 non-null int64 # dtypes: int64(1), object(2) # memory usage: 248.0+ bytes
In the next section, you’ll learn how to use .applymap() to convert all columns in a Pandas dataframe to strings.
Convert All Pandas Dataframe Columns to String Using Applymap
In this final section, you’ll learn how to use the .applymap() method to convert all Pandas dataframe columns to string.
Let’s take a look at what this looks like:
import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) df = df.applymap(str) print(df.info())
# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null object # 2 Income 5 non-null object # dtypes: object(3) # memory usage: 248.0+ bytes
If, instead, we wanted to convert the datatypes to the new string datatype, then we could loop over each column. This would look like this:
import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) for col in df.columns: df[col] = df[col].astype('string') print(df.info())
This returns the following:
# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null string # 1 Age 5 non-null string # 2 Income 5 non-null string # dtypes: string(3) # memory usage: 248.0 bytes
Conclusion
In this tutorial, you learned how to use Python Pandas to convert a column’s values to strings. You learned the differences between the different ways in which Pandas stores strings. You also learned four different ways to convert the values to string types. Finally, you learned how to convert all dataframe columns to string types in one go.
To learn more about how Pandas intends to handle strings, check out this API documentation here.