One-Hot Encoding on NumPy Array in Python
- Use the NumPy Module to Perform One-Hot Encoding on a NumPy Array in Python
-
Use the
sklearn
Module to Perform One-Hot Encoding on a NumPy Array in Python -
Use the
pandas
Module to Perform One-Hot Encoding on a NumPy Array in Python -
Use the
keras
Module to Perform One-Hot Encoding on a NumPy Array in Python
Python has a vast framework available for machine learning. We can train and test models easily. However, when it comes to categorical data, some algorithms cannot operate with such data labels and require numeric values.
Therefore, one-hot encoding is a highly used technique for encoding data before using it in an algorithm.
In this tutorial, we will learn how to perform one-hot encoding on numpy arrays.
Use the NumPy Module to Perform One-Hot Encoding on a NumPy Array in Python
In this method, we will generate a new array that contains the encoded data. We will use the numpy.zeros()
function to create an array of 0s of the required size. We will then replace 0 with 1 at corresponding locations by using the numpy.arange()
function.
For example,
import numpy as np
a = np.array([1, 0, 3])
b = np.zeros((a.size, a.max() + 1))
b[np.arange(a.size), a] = 1
print(b)
Output:
[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 0. 1.]]
We can also use the eye()
function to perform one-hot encoding on arrays. It returns a 2-Dimensional with 1 at the main diagonal and 0 elsewhere by default. We can use this method and specify the locations we want 1s to be, as shown below.
import numpy as np
values = [1, 0, 3]
n_values = np.max(values) + 1
print(np.eye(n_values)[values])
Output:
[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 0. 1.]]
Use the sklearn
Module to Perform One-Hot Encoding on a NumPy Array in Python
The sklearn.preprocessing.LabelBinarizer
is a class available in Python, which can perform this encoding efficiently. It is used to binarize multi-labels by converting them to numeric form. We will use the transform()
function to convert the data using an object of this class.
The following code explains this.
import sklearn.preprocessing
import numpy as np
a = np.array([1, 0, 3])
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(a) + 1))
b = label_binarizer.transform(a)
print(b)
Output:
[[0 1 0 0]
[1 0 0 0]
[0 0 0 1]]
Use the pandas
Module to Perform One-Hot Encoding on a NumPy Array in Python
Datasets for Machine Learning algorithms are usually in the form of a pandas
DataFrame. Therefore the pandas
module is well equipped to perform data encoding. The get_dummies()
can be used to convert a categorical dataset into numerical indicators thus, performing the one-hot encoding. The final result is a DataFrame.
For example,
import pandas as pd
import numpy as np
a = np.array([1, 0, 3])
b = pd.get_dummies(a)
print(b)
Output:
0 1 3
0 0 1 0
1 1 0 0
2 0 0 1
Use the keras
Module to Perform One-Hot Encoding on a NumPy Array in Python
The keras
module is widely used for Machine Learning in Python. The to_categorical()
function from this module can perform one-hot encoding on data.
The code snippet below shows how.
from keras.utils.np_utils import to_categorical
import numpy as np
a = np.array([1, 0, 3])
b = to_categorical(a, num_classes=(len(a) + 1))
print(b)
Output:
[[0. 1. 0. 0.]
[1. 0. 0. 0.]
[0. 0. 0. 1.]]
Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.
LinkedIn