Python convert column to string

Как преобразовать столбцы Pandas DataFrame в строки

Часто вы можете захотеть преобразовать один или несколько столбцов в кадре данных pandas в строки. К счастью, это легко сделать с помощью встроенной функции pandas astype(str) .

В этом руководстве показано несколько примеров использования этой функции.

Пример 1: преобразование одного столбца DataFrame в строку

Предположим, у нас есть следующие Pandas DataFrame:

import pandas as pd #create DataFrame df = pd.DataFrame() #view DataFrame df player points assists 0 A 25 5 1 B 20 7 2 C 14 7 3 D 16 8 4 E 27 11 

Мы можем определить тип данных каждого столбца с помощью dtypes:

df.dtypes player object points int64 assists int64 dtype: object 

Мы видим, что столбец «игрок» представляет собой строку, а два других столбца «очки» и «ассисты» — целые числа.

Мы можем преобразовать столбец «точки» в строку, просто используя astype(str) следующим образом:

df['points'] = df['points'].astype( str ) 

Мы можем убедиться, что этот столбец теперь является строкой, еще раз используя dtypes:

df.dtypes player object points object assists int64 dtype: object 

Пример 2. Преобразование нескольких столбцов DataFrame в строки

Мы можем преобразовать оба столбца «точки» и «ассисты» в строки, используя следующий синтаксис:

df[['points', 'assists']] = df[['points', 'assists']].astype( str ) 

И еще раз мы можем проверить, что это строки, используя dtypes:

df.dtypes player object points object assists object dtype: object 

Пример 3: преобразование всего фрейма данных в строки

Наконец, мы можем преобразовать каждый столбец в DataFrame в строки, используя следующий синтаксис:

#convert every column to strings df = df.astype(str) #check data type of each column df.dtypes player object points object assists object dtype: object 

Вы можете найти полную документацию по функции astype() здесь .

Источник

pandas.DataFrame.to_string#

DataFrame. to_string ( buf = None , columns = None , col_space = None , header = True , index = True , na_rep = ‘NaN’ , formatters = None , float_format = None , sparsify = None , index_names = True , justify = None , max_rows = None , max_cols = None , show_dimensions = False , decimal = ‘.’ , line_width = None , min_rows = None , max_colwidth = None , encoding = None ) [source] #

Render a DataFrame to a console-friendly tabular output.

Parameters buf str, Path or StringIO-like, optional, default None

Buffer to write to. If None, the output is returned as a string.

columns sequence, optional, default None

The subset of columns to write. Writes all columns by default.

col_space int, list or dict of int, optional

The minimum width of each column. If a list of ints is given every integers corresponds with one column. If a dict is given, the key references the column, while the value defines the space to use..

header bool or sequence of str, optional

Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.

index bool, optional, default True

Whether to print index (row) labels.

na_rep str, optional, default ‘NaN’

String representation of NaN to use.

formatters list, tuple or dict of one-param. functions, optional

Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List/tuple must be of length equal to the number of columns.

float_format one-parameter function, optional, default None

Formatter function to apply to columns’ elements if they are floats. This function must return a unicode string and will be applied only to the non- NaN elements, with NaN being handled by na_rep .

Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row.

index_names bool, optional, default True

Prints the names of the indexes.

justify str, default None

How to justify the column labels. If None uses the option from the print configuration (controlled by set_option), ‘right’ out of the box. Valid values are

  • left
  • right
  • center
  • justify
  • justify-all
  • start
  • end
  • inherit
  • match-parent
  • initial
  • unset.

Maximum number of rows to display in the console.

max_cols int, optional

Maximum number of columns to display in the console.

show_dimensions bool, default False

Display DataFrame dimensions (number of rows by number of columns).

decimal str, default ‘.’

Character recognized as decimal separator, e.g. ‘,’ in Europe.

line_width int, optional

Width to wrap a line in characters.

min_rows int, optional

The number of rows to display in the console in a truncated repr (when number of rows is above max_rows ).

max_colwidth int, optional

Max width to truncate each column in characters. By default, no limit.

encoding str, default “utf-8”

If buf is None, returns the result as a string. Otherwise returns None.

Convert DataFrame to HTML.

>>> d = 'col1': [1, 2, 3], 'col2': [4, 5, 6]> >>> df = pd.DataFrame(d) >>> print(df.to_string()) col1 col2 0 1 4 1 2 5 2 3 6 

Источник

Pandas: Convert Column Values to Strings

Pandas Convert Column Values to Strings Cover Image

In this tutorial, you’ll learn how to use Python’s Pandas library to convert a column’s values to a string data type. You will learn how to convert Pandas integers and floats into strings. You’ll also learn how strings have evolved in Pandas, and the advantages of using the Pandas string dtype. You’ll learn four different ways to convert a Pandas column to strings and how to convert every Pandas dataframe column to a string.

The Quick Answer: Use pd.astype(‘string’)

Quick Answer - Pandas Convert Column Values to Strings

Loading a Sample Dataframe

In order to follow along with the tutorial, feel free to load the same dataframe provided below. We’ll load a dataframe that contains three different columns: 1 of which will load as a string and 2 that will load as integers.

We’ll first load the dataframe, then print its first five records using the .head() method.

import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) print('df head:') print(df.head())

This returns the following information:

# df head: # Name Age Income # 0 Nik 30 70000 # 1 Jane 31 72000 # 2 Matt 29 83000 # 3 Kate 33 90000 # 4 Clark 43 870000

Let’s start the tutorial off by learning a little bit about how Pandas handles string data.

What is the String Datatype in Pandas?

To explore how Pandas handles string data, we can use the .info() method, which will print out information on the dataframe, including the datatypes for each column.

Let’s take a look at what the data types are:

print(df.info()) # Returns: # # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null int64 # 2 Income 5 non-null int64 # dtypes: int64(2), object(1) # memory usage: 248.0+ bytes

We can see here that by default, Pandas will store strings using the object datatype. The object data type is used for strings and for mixed data types, but it’s not particularly explicit.

Beginning in version 1.0, Pandas has had a dedicated string datatype. While this datatype currently doesn’t offer any explicit memory or speed improvements, the development team behind Pandas has indicated that this will occur in the future.

Because of this, the tutorial will use the string datatype throughout the tutorial. If you’re using a version lower than 1.0, please replace string with str in all instances.

Let’s get started by using the preferred method for using Pandas to convert a column to a string.

Convert a Pandas Dataframe Column Values to String using astype

Pandas comes with a column (series) method, .astype() , which allows us to re-cast a column into a different data type.

Many tutorials you’ll find only will tell you to pass in ‘str’ as the argument. While this holds true for versions of Pandas lower than 1.0, if you’re using 1.0 or later, pass in ‘string’ instead.

Doing this will ensure that you are using the string datatype, rather than the object datatype. This will ensure significant improvements in the future.

Let’s take a look at how we can convert a Pandas column to strings, using the .astype() method:

df['Age'] = df['Age'].astype('string') print(df.info())

This returns the following:

# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null string # 2 Income 5 non-null int64 # dtypes: int64(1), object(1), string(1) # memory usage: 248.0+ bytes

We can see that our Age column, which was previously stored as int64 is now stored as the string datatype.

In the next section, you’ll learn how to use the .map() method to convert a Pandas column values to strings.

Convert a Pandas Dataframe Column Values to String using map

Similar to the .astype() Pandas series method, you can use the .map() method to convert a Pandas column to strings.

Let’s take a look at what this looks like:

import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) df['Age'] = df['Age'].map(str) print(df.info())

This returns the following:

# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null object # 2 Income 5 non-null int64 # dtypes: int64(1), object(2) # memory usage: 248.0+ bytes

We can see here that by using the .map() method, we can’t actually use the string datatype. Because of this, the data are saved in the object datatype. Because of this, I would not recommend this approach if you’re using a version higher than 1.0.

In the next section, you’ll learn how to use the .apply() method to convert a Pandas column’s data to strings.

Convert a Pandas Dataframe Column Values to String using apply

Similar to the method above, we can also use the .apply() method to convert a Pandas column values to strings. This comes with the same limitations, in that we cannot convert them to string datatypes, but rather only the object datatype.

Let’s see what this looks like:

import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) df['Age'] = df['Age'].apply(str) print(df.info())

This returns the following:

# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null object # 2 Income 5 non-null int64 # dtypes: int64(1), object(2) # memory usage: 248.0+ bytes

In the next section, you’ll learn how to use the value.astype() method to convert a dataframe column’s values to strings.

Convert a Pandas Dataframe Column Values to String using values.astype

Finally, we can also use the .values.astype() method to directly convert a column’s values into strings using Pandas.

Let’s see what this looks like:

import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) df['Age'] = df['Age'].values.astype(str) print(df.info())

This returns the following:

# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null object # 2 Income 5 non-null int64 # dtypes: int64(1), object(2) # memory usage: 248.0+ bytes

In the next section, you’ll learn how to use .applymap() to convert all columns in a Pandas dataframe to strings.

Convert All Pandas Dataframe Columns to String Using Applymap

In this final section, you’ll learn how to use the .applymap() method to convert all Pandas dataframe columns to string.

Let’s take a look at what this looks like:

import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) df = df.applymap(str) print(df.info())
# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null object # 1 Age 5 non-null object # 2 Income 5 non-null object # dtypes: object(3) # memory usage: 248.0+ bytes

If, instead, we wanted to convert the datatypes to the new string datatype, then we could loop over each column. This would look like this:

import pandas as pd df = pd.DataFrame(< 'Name':['Nik', 'Jane', 'Matt', 'Kate', 'Clark'], 'Age': [30, 31, 29, 33, 43], 'Income':[70000, 72000, 83000, 90000, 870000] >) for col in df.columns: df[col] = df[col].astype('string') print(df.info())

This returns the following:

# # RangeIndex: 5 entries, 0 to 4 # Data columns (total 3 columns): # # Column Non-Null Count Dtype # --- ------ -------------- ----- # 0 Name 5 non-null string # 1 Age 5 non-null string # 2 Income 5 non-null string # dtypes: string(3) # memory usage: 248.0 bytes

Conclusion

In this tutorial, you learned how to use Python Pandas to convert a column’s values to strings. You learned the differences between the different ways in which Pandas stores strings. You also learned four different ways to convert the values to string types. Finally, you learned how to convert all dataframe columns to string types in one go.

To learn more about how Pandas intends to handle strings, check out this API documentation here.

Источник

Читайте также:  Html input readonly style
Оцените статью