How to Read CSV to Array in Python
-
Use
numpy.loadtxt()
to Read a CSV File Into an Array in Python -
Use the
list()
Method to Read a CSV File Into an Array in Python -
Use the
pd.read_csv
Method to Read a CSV File Into an Array in Python -
Use the
np.genfromtxt
Method to Read a CSV File Into an Array in Python - Conclusion
The use of CSV files is widespread in the field of data analysis/data science in Python. CSV stands for Comma Separated Values
. These types of files are used to store data in the form of tables and records.
In these tables, there are a lot of columns separated by commas. One of the tasks in manipulating these CSV files is importing these files in the form of data arrays.
This tutorial will introduce different methods to import CSV files in the form of data arrays.
Use numpy.loadtxt()
to Read a CSV File Into an Array in Python
As the name suggests, the open()
function is used to open the CSV file. NumPy’s loadtxt()
function helps in loading the data from a text file.
In this function’s arguments, there are two parameters that must be mentioned: file name or the variable in which the file name is stored, and the other one is called delimiter
, which denotes the string used for separating the values.
The default value of the delimiter is whitespace.
Example:
with open("example.csv", "w") as file:
file.write("1,2,3\n4,5,6\n7,8,9")
import numpy as np
# Reading the CSV file into an array
data_array = np.loadtxt("example.csv", delimiter=",")
# Displaying the result
print(data_array)
In this example, we begin by creating a CSV file named example.csv
with three rows and three columns of numbers. We employ a straightforward file write operation for this task.
Next, we import NumPy and utilize np.loadtxt()
to read the contents of example.csv
. We specify the delimiter as ,
since our data is comma-separated.
The function reads the data and transforms it into a 2D NumPy array. We then employ the print()
function to showcase the contents of the array.
Use the list()
Method to Read a CSV File Into an Array in Python
Here, we use the csv
module of Python, which is used to read that CSV file in the same tabular format. More precisely, the reader()
method of this module is used to read the CSV file.
Finally, the list()
method takes all the sequences and the values in tabular format and converts them into a list.
Example:
with open("example.csv", "w") as file:
file.write("Name,Age,Occupation\nJohn,28,Engineer\nJane,34,Doctor")
import csv
# Reading the CSV file into an array
with open("example.csv", "r") as file:
csv_reader = csv.reader(file)
data_array = list(csv_reader)
# Displaying the result
print(data_array)
We start by creating a CSV file named example.csv
with a header and two data rows.
We then read this file using csv.reader
. When we open the file with open('example.csv', 'r')
, we are creating a file object that csv.reader
can iterate over.
csv.reader
reads each line in the file and returns a list of strings representing the fields in that row.
We then convert this reader object into a list using list(csv_reader)
. This operation effectively loads all rows from the CSV file into a list of lists, where each inner list is a row in the CSV.
Finally, we use print()
to display the contents of the array.
Use the pd.read_csv
Method to Read a CSV File Into an Array in Python
Pandas offer extensive functionality for reading, processing, and writing data in various formats, including CSV (Comma-Separated Values). The pandas.read_csv()
function is a versatile and powerful tool for reading CSV files into Pandas DataFrames
, which can then be easily converted to arrays.
Example:
with open("example.csv", "w") as file:
file.write("Name,Age,Occupation\nJohn,28,Engineer\nJane,34,Doctor")
import pandas as pd
# Reading the CSV file into a DataFrame
df = pd.read_csv("example.csv")
# Converting the DataFrame to a numpy array
data_array = df.values
# Displaying the result
print(data_array)
In the provided code, we initiate by creating a straightforward CSV file named example.csv
using standard file I/O operations in Python.
Subsequently, we leverage Pandas to read this CSV file.
The pd.read_csv('example.csv')
function reads the CSV file into a DataFrame. This DataFrame df
constitutes a 2D labeled data structure with columns potentially of different types.
To transform this DataFrame into a NumPy array, we utilize the .values
attribute. This presents a simple and efficient method for converting the data into an array format, which can then be employed for further numerical computations or processing.
Finally, we employ the print()
function to showcase the contents of the NumPy array.
Use the np.genfromtxt
Method to Read a CSV File Into an Array in Python
A fundamental package for numerical computations in Python provides the np.genfromtxt
function. It’s designed to handle CSV (Comma-Separated Values) and other delimited text files, especially when dealing with missing or heterogeneous data.
Example:
with open("example.csv", "w") as file:
file.write("1,2,3\n4,,6\n7,8,")
import numpy as np
# Reading the CSV file into an array, handling missing values
data_array = np.genfromtxt("example.csv", delimiter=",", filling_values=np.nan)
# Displaying the result
print(data_array)
In this example, we create a CSV file named example.csv
, intentionally including missing values represented by empty fields.
We proceed by importing NumPy and employing np.genfromtxt
to read the file. We use the delimiter=','
parameter to indicate that the fields are comma-separated.
To effectively manage missing values, we specify filling_values=np.nan
, which substitutes missing entries with NaN (Not a Number)
.
This function adeptly reads the data and returns a 2D NumPy array, seamlessly accommodating for the missing values.
Finally, we use the print()
function to display the array.
Conclusion
In conclusion, this article has comprehensively demonstrated various methods for importing CSV files into data arrays in Python, a fundamental skill in data analysis and data science. We explored techniques ranging from using NumPy’s loadtxt()
and genfromtxt()
functions, which are ideal for numerical and mixed-type data with the capability to handle missing values, to employing Python’s built-in csv
module for more general purposes.
Additionally, we showcased how Pandas’ read_csv()
function provides a powerful and flexible approach for reading CSV files into DataFrames, which can be easily converted into NumPy arrays for further analysis. Each method has its unique advantages and is suitable for different scenarios, giving Python programmers versatile tools to effectively handle CSV data in various formats and structures.
Lakshay Kapoor is a final year B.Tech Computer Science student at Amity University Noida. He is familiar with programming languages and their real-world applications (Python/R/C++). Deeply interested in the area of Data Sciences and Machine Learning.
LinkedInRelated Article - Python CSV
- How to Import Multiple CSV Files Into Pandas and Concatenate Into One DataFrame
- How to Split CSV Into Multiple Files in Python
- How to Compare Two CSV Files and Print Differences Using Python
- How to Convert XLSX to CSV File in Python
- How to Write List to CSV Columns in Python
- How to Write to CSV Line by Line in Python