Python import data to array

How to import a csv-file into a data array in Python?

Python provides several ways to import a CSV file into a data array. Here are two common ways to do so:

Method 1: Using the built-in csv module

with open('file.csv', 'r') as file:

The open() function takes two arguments: the name of the file and the mode in which to open the file. In this case, we’re opening the file in ‘r’ (reading) mode.

This will create a CSV reader object that can be used to read the contents of the file.

for row in reader: print(row)

This will print each row of the CSV file, where each row is a list of strings representing the values in that row.

Method 2: Using the pandas library

This will return a DataFrame, which is a 2-dimensional size-mutable, tabular data structure with rows and columns.

This will print the values of the column with the name ‘column_name’

This will print the values of the row with index ‘index’

Both of these methods allow you to import a CSV file into a data array, with the first method using the built-in csv module and the second method using the pandas library. The pandas library provides additional functionality for working with data, such as filtering and manipulating the data.

Читайте также:  Статические блоки java зачем нужны

When working with large datasets it is recommended to use the pandas library as it is more efficient and flexible than using the built-in csv module.

When working with CSV files in Python, it’s important to be aware of the different options available for controlling how the data is read and written. Here are a few examples of how to customize the import process:

  1. Specifying the delimiter: By default, the csv.reader() function assumes that the delimiter is a comma ( , ). If your CSV file uses a different delimiter, you can specify it using the delimiter option. For example, if your file uses a tab as a delimiter, you can use the following code:
with open('file.csv', 'r') as file: reader = csv.reader(file, delimiter='\t')
  1. Handling missing values: By default, the pandas.read_csv() function will treat any blank cells in the CSV file as missing values. You can use the na_values option to specify a different value to be treated as missing. For example, if your CSV file uses the value N/A to indicate missing values, you can use the following code:
data = pd.read_csv('file.csv', na_values='N/A')
  1. Skipping rows: If your CSV file has a header row that you want to skip, you can use the skiprows option to specify how many rows to skip. For example, if your CSV file has a header row at the top, you can use the following code:
data = pd.read_csv('file.csv', skiprows=1)

This will skip the first row of the CSV file and start reading data from the second row.

  1. Selecting specific columns: If you only need to import a specific subset of columns from the CSV file, you can use the usecols option to specify which columns to include. For example, if you only need to import the first and third columns of the CSV file, you can use the following code:
data = pd.read_csv('file.csv', usecols=[0, 2])
  1. Setting the index: By default, the pandas.read_csv() function will use the default integer index. If you want to use a specific column from the CSV file as the index, you can use the index_col option. For example, if you want to use the first column of the CSV file as the index, you can use the following code:
data = pd.read_csv('file.csv', index_col=0)

These are just a few examples of the options available for importing CSV files in Python, and there are many more options available for controlling how the data is read and written. It’s important to carefully read the documentation for the csv module and the pandas library to understand all the options available and how to use them.

Conclusion

In conclusion, importing a CSV file into a data array in Python is a relatively simple process that can be done using the built-in csv module or the pandas library. The csv module provides a basic way to read and write CSV files, while the pandas library provides additional functionality for working with data, such as filtering and manipulating the data. Both methods allow you to import a CSV file into a data array, with the first method using the built-in csv module and the second method using the pandas library. When working with large datasets it is recommended to use the pandas library as it is more efficient and flexible than using the built-in csv module. The options available for controlling how the data is read and written such as delimiter, handling missing values, skipping rows, selecting specific columns, setting the index are also important to be aware of. It’s important to carefully read the documentation for the csv module and the pandas library to understand all the options available and how to use them.

Источник

Import .dat file as an array

However, when I do the following it gives an odd result, probably because that first element is not a float.

np.fromfile('mydat.dat', dtype=float) array([ 3.45301146e-086, 3.45300781e-086, 3.25195588e-086, . 8.04331780e-096, 8.04331780e-096, 1.31544776e-259]) 

Any suggestions on this? These were the two main ways to import .dat files into Python as an array and they don’t seem to provide the desired result.

are the lines always in that form? like id, then some values, and a newline separating the lines of data, do you want a 2d array, and would lists work instead of array?

There is not new list. I fixed it. need 2d array. The end goal is to use the data in Keras, so I do need it as an array

2 Answers 2

Here is one way where we read each line of ‘mydat.dat’ file , convert each value to str or float and then load to numpy array .

import numpy as np def is_float(string): """ True if given string is float else False""" try: return float(string) except ValueError: return False data = [] with open('mydat.dat', 'r') as f: d = f.readlines() for i in d: k = i.rstrip().split(",") data.append([float(i) if is_float(i) else i for i in k]) data = np.array(data, dtype='O') 
>>> data array([['ID_1', 5.0, 5.0, 5.0], ['ID_2', 5.0, 5.0, 5.0]], dtype=object) 

Also, if you can use pandas to read and manipulate data , I would do so. pandas works with much efficiency especially for larger data and is easy to manipulate.

#read data as csv to a dataframe >>> df = pd.read_csv('mydat.dat', sep=",", header=None) >>> df 0 1 2 3 0 ID_1 5.0 5.0 5.0 1 ID_2 5.0 5.0 5.0 #Transposed data with ID numbers as headers >>> df.T 0 1 0 ID_1 ID_2 1 5 5 2 5 5 3 5 5 >>> 

Источник

How to read a text file into a list or an array with Python

I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created. The text file is formatted as follows:

Where the . is above, there actual text file has hundreds or thousands more items. I’m using the following code to try to read the file into a list:

text_file = open("filename.dat", "r") lines = text_file.readlines() print lines print len(lines) text_file.close() 

Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?

Just as a note. It looks like this question should be rephrased as how to read a csv file into a list in Python. But I defer to the OP’s original intentions over 4 years ago which I don’t know.

7 Answers 7

You will have to split your string into a list of values using split()

lines = text_file.read().split(',') 

EDIT: I didn’t realise there would be so much traction to this. Here’s a more idiomatic approach.

import csv with open('filename.csv', 'r') as fd: reader = csv.reader(fd) for row in reader: # do something 

I think that this answer could be bettered. If you consider a multiline .csv file (as mentioned by the OP), e.g., a file containing the alphabetic characters 3 by row ( a,b,c , d,e,f , etc) and apply the procedure described above what you get is a list like this: [‘a’, ‘b’, ‘c\nd’, ‘e’, . ] (note the item ‘c\nd’ ). I’d like to add that, the above problem notwistanding, this procedure collapses data from individual rows in a single mega-list, usually not what I want when processing a record-oriented data file.

You can also use numpy loadtxt like

from numpy import loadtxt lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False) 

I need this too. I noticed on a Raspberry Pi that numpy works really slow. For this application I reverted to open a file and read it line by line.

This is useful for specifying format too, via dtype : data-type parameter. docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html Pandas read_csv is very easy to use. But I did not see a way to specify format for it. It was reading floats from my file, whereas I needed string. Thanks @Thiru for showing loadtxt.

if txt files contains strings, then dtype should be specified, so it should be like lines = loadtxt(«filename.dat», dtype=str, comments=»#», delimiter=»,», unpack=False)

So you want to create a list of lists. We need to start with an empty list

next, we read the file content, line by line

with open('data') as f: for line in f: inner_list = [elt.strip() for elt in line.split(',')] # in alternative, if you need to use the file content as numbers # inner_list = [int(elt.strip()) for elt in line.split(',')] list_of_lists.append(inner_list) 

A common use case is that of columnar data, but our units of storage are the rows of the file, that we have read one by one, so you may want to transpose your list of lists. This can be done with the following idiom

by_cols = zip(*list_of_lists) 

Another common use is to give a name to each column

col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue') by_names = <> for i, col_name in enumerate(col_names): by_names[col_name] = by_cols[i] 

so that you can operate on homogeneous data items

 mean_apple_prices = [money/fruits for money, fruits in zip(by_names['apples revenue'], by_names['apples_sold'])] 

Most of what I’ve written can be speeded up using the csv module, from the standard library. Another third party module is pandas , that lets you automate most aspects of a typical data analysis (but has a number of dependencies).

Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.

If you need indexed access you can use

by_cols = list(zip(*list_of_lists)) 

that gives you a list of lists in both versions of Python.

On the other hand, if you don’t need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine.

file = open('some_data.csv') names = get_names(next(file)) columns = zip(*((x.strip() for x in line.split(',')) for line in file))) d = <> for name, column in zip(names, columns): d[name] = column 

Источник

Pythonic way to import data from multiple files into an array

I’m relatively new to Python and wondering how best to import data from multiple files into a single array. I have quite a few text files containing 50 rows of two columns of data (column delimited) such as:

Length=10.txt: 1, 10 2, 30 3, 50 #etc END OF FILE 
Length=20.txt 1, 50.7 2, 90.9 3, 10.3 #etc END OF FILE 

Let’s say I have 10 text files to import and import into a variable called data. I’d like to create a single 3D array containing all data. That way, I can easily plot and manipulate the data by referring to the data by data[. n] where n refers to the index of the text file. I think the way I’d do this is to have an array of shape (50, 2, 10), but don’t know how best to use python to create it. I’ve thought about using a loop to import each text file as a 2D array, and then stack them to create a 2D array, although couldn’t find the appropriate commands to do this (I looked at vstack and column_stack in numpy but these don’t seem to add an extra dimension). So far, I’ve written the import code:

 file_list = glob.glob(source_dir + '/*.TXT') #Get folder path containing text files for file_path in file_list: data = np.genfromtxt(file_path, delimiter=',', skip_header=3, skip_footer=18) 

But the problem with this code, is that I can only process data when it’s in the for loop. What I really want is an array of all data imported from the text files. Any help would be greatly appreciated thanks!

Источник

Оцените статью