How to Convert Pandas Dataframe to NumPy Array
-
to_numpy
Method to Convert PandasDataFrame
toNumPy
Array -
Values()
Method to Convert PandasDataFrame
toNumPy
Array -
To_records()
Method to ConvertDataFrame
toNumPy
Record Array
We will learn to_numpy()
method to convert the pandas.Dataframe
to NumPy
array, introduced from pandas v0.24.0 replacing the deprecated .values
method. We can define to_numpy
on Index
, Series
, and DataFrame
objects.
The deprecated DataFrame.values()
method has inconsistent behavior; therefore, it is not recommended to use according to Pandas API documentation. However, we will look into an example of this method if you are using an older Pandas version.
We will also introduce another approach using DataFrame.to_records()
method to convert the given DataFrame
to a NumPy
record array.
to_numpy
Method to Convert Pandas DataFrame
to NumPy
Array
pandas.Dataframe
is a 2d tabular data structure with rows and columns. This data structure can be converted into NumPy
array by using the to_numpy
method:
# python 3.x
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.randint(0, 10, (6, 4)), columns=["a", "b", "c", "d"])
nmp = df.to_numpy()
print(nmp)
print(type(nmp))
Output:
[[5 5 1 3]
[1 6 6 0]
[9 1 2 0]
[9 3 5 3]
[7 9 4 9]
[8 1 8 9]]
<class 'numpy.ndarray'>
Pandas DataFrame to_numpy()
method converts the DataFrame
to a NumPy
array as shown above.
Values()
Method to Convert Pandas DataFrame
to NumPy
Array
We could also use the Dataframe.values()
method as follows.
# python 3.x
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.randint(0, 10, (6, 4)), columns=["a", "b", "c", "d"])
nmp = df.values
print(nmp)
print(type(nmp))
Output:
[[8 8 5 0]
[1 7 7 5]
[0 2 4 2]
[6 8 0 7]
[6 4 5 1]
[1 8 4 7]]
<class 'numpy.ndarray'>
If we want to include the index column in the converted NumPy
array, we need to apply reset_index()
with dataframe.values
.
# python 3.x
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.randint(0, 10, (6, 4)), columns=["a", "b", "c", "d"])
nmp = df.reset_index().values
print(nmp)
print(type(nmp))
Output:
[[0 1 0 3 7]
[1 8 2 5 1]
[2 2 2 7 3]
[3 3 4 3 7]
[4 5 4 4 3]
[5 2 9 7 6]]
<class 'numpy.ndarray'>
To_records()
Method to Convert DataFrame
to NumPy
Record Array
If you need the dtypes
, to_records()
is the best option to use. Performance wise both to_numpy()
and to_records()
are almost same:
# python 3.x
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.randint(0, 10, (6, 4)), columns=["a", "b", "c", "d"])
nmp = df.to_records()
print(nmp)
print(type(nmp))
Output:
[(0, 0, 4, 6, 1)
(1, 3, 1, 7, 1)
(2, 9, 1, 6, 4)
(3, 1, 4, 6, 9)
(4, 9, 1, 3, 9)
(5, 2, 5, 7, 9)]
<class 'numpy.recarray'>