How to Convert NumPy Array to Pandas DataFrame
This tutorial explains how to convert a numpy array to a Pandas DataFrame using the pandas.DataFrame()
method.
We pass the numpy array into the pandas.DataFrame()
method to generate Pandas DataFrames from NumPy arrays. We can also specify column names and row indices for the DataFrame.
Convert NumPy Array to Pandas DataFrame Using the pandas.DataFrame()
Method
We pass the NumPy array into the pandas.DataFrame()
method to generate the DataFrame from the NumPy array.
from numpy import random
import pandas as pd
random.seed(5)
random.randint(100, size=(3, 5))
data_array = random.randint(100, size=(4, 3))
print("NumPy Data Array is:")
print(data_array)
print("")
data_df = pd.DataFrame(data_array)
print("The DataFrame generated from the NumPy array is:")
print(data_df)
Output:
NumPy Data Array is:
[[27 44 77]
[75 65 47]
[30 84 86]
[18 9 41]]
The DataFrame generated from the NumPy array is:
0 1 2
0 27 44 77
1 75 65 47
2 30 84 86
3 18 9 41
It first creates a random array of size (4,3)
with 4 rows and 3 columns. We then pass the array as an argument to the pandas.DataFrame()
method, which generates DataFrame named data_df
out of the array. By default, the pandas.DataFrame()
method will insert default column names and row indices.
We can also set the column names and row indices using the index
and columns
parameter of the pandas.DataFrame()
method.
from numpy import random
import pandas as pd
random.seed(5)
random.randint(100, size=(3, 5))
data_array = random.randint(100, size=(4, 3))
row_indices = ["Row_1", "Row_2", "Row_3", "Row_4"]
column_names = ["Column_1", "Column_2", "Column_3"]
print("NumPy Data Array is:")
print(data_array)
print("")
data_df = pd.DataFrame(data_array, index=row_indices, columns=column_names)
print("The DataFrame generated from the NumPy array is:")
print(data_df)
Output:
NumPy Data Array is:
[[27 44 77]
[75 65 47]
[30 84 86]
[18 9 41]]
The DataFrame generated from the NumPy array is:
Column_1 Column_2 Column_3
Row_1 27 44 77
Row_2 75 65 47
Row_3 30 84 86
Row_4 18 9 41
Here, we set the value of index
to row_indices
, a list containing each row’s indices. Similarly, we assign column names by setting the value of columns
to the list column_names
, which contains each column’s name.
In some cases, the NumPy array itself may contain row indices and column names. Then we use array slicing to extract the data, row indices, and column names from the array.
import numpy as np
import pandas as pd
marks_array = np.array(
[["", "Mathematics", "Economics"], ["Sunny", 25, 23], ["Alice", 23, 24]]
)
print("NumPy Data Array is:")
print(marks_array)
print("")
row_indices = marks_array[1:, 0]
column_names = marks_array[0, 1:]
data_df = pd.DataFrame(
data=np.int_(marks_array[1:, 1:]), index=row_indices, columns=column_names
)
print("The DataFrame generated from the NumPy array is:")
print(data_df)
Output:
NumPy Data Array is:
[['' 'Mathematics' 'Economics']
['Sunny' '25' '23']
['Alice' '23' '24']]
The DataFrame generated from the NumPy array is:
Mathematics Economics
Sunny 25 23
Alice 23 24
We have row indices and column names in the NumPy array itself. We select all the values after the first row and first column and provide it as a data
argument to the pandas.DataFrame()
function, and select all the first column values from the second row and pass it as an index
argument. Similarly, we select all the first row values from the second column and pass it as columns
argument to set the column names.
The numpy.array()
will convert the integer values into string values while making NumPy array to ensure the array’s same data format. We use the numpy.int_()
function to convert the data values back to the integer
type.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn