SciPy stats.zscore Function

Lakshay Kapoor Jan 30, 2023 SciPy SciPy Stats

the scipy.stats.zscore Function
Calculating the z-score for a One-dimensional Array in Python
Calculating the z-score for a Multi-Dimensional Array in Python
Calculating the z-score for a Pandas Dataframe in Python

z-score is a statistic method that helps calculate how many values standard deviation away is a particular value away from the mean value. The z-score is calculated with the help of the following formula.

z = (X – μ) / σ

In which,

X is a particular value from the data
μ is the mean value
σ is the standard deviation

This tutorial will show how to calculate the z-score value of any data in Python using the SciPy library.

the `scipy.stats.zscore` Function

The scipy.stats.zscore function of the SciPy library helps to calculate the relative z-score of the given input raw data along with the data’s mean and standard deviation. It is defined as scipy.stats.zscore(a, axis, ddof, nan_policy).

Following are the parameters of the scipy.stats.zscore function.


`a (array)`	An array-like object of the raw input data.
`axis (int)`	It defines the axis along which the function computes the `z-score` value. The default value is `0` i.e, the function computes over the whole array.
`ddof (int)`	It defines the degree of freedom correction in the whole computation of the standard deviation.
`nan_policy`	This parameter decides how to deal when there are NaN values in the input data. There are three decision parameters in the parameter, `propagate`, `raise`, `omit`. `propagate` simply returns the NaN value, `raise` returns an error and `omit` simply ignores the NaN values and the function continues with computation. These decision parameters are defined in single quotes `''`. Also, NaN values never affect the `z-score` value that is calculated for the other values present in the input data.

All the parameters except the a (array) parameter are optional. That means it is not necessary to define them every time while using the scipy.stats.zscore function.

Now, let us use the scipy.stats.zscore function on one-dimensional array, multi dimensional array, and Pandas Dataframe.

Calculating the `z-score` for a `One-dimensional` Array in Python

import numpy as np
import scipy.stats as stats

input_data = np.array([5, 10, 20, 35, 25, 22, 19, 19, 50, 45, 62])

stats.zscore(input_data)

Output:

array([-1.3916106 , -1.09379511, -0.49816411,  0.39528239, -0.20034861,
       -0.37903791, -0.55772721, -0.55772721,  1.28872889,  0.99091339,
        2.00348608])

Note that each z-score value tells that how many standard deviation values away is its corresponding value away from the mean value. Here, the negative sign represents that that value is that many standard deviations below the mean value, and the positive sign represents that that value is that many standard deviations above the mean value. If a z-score value comes out to be 0, then that value is 0 standard deviation values away from the mean value.

Calculating the `z-score` for a Multi-Dimensional Array in Python

import numpy as np
import scipy.stats as stats

data = np.array([[5, 10, 20, 35], [25, 22, 19, 19], [50, 45, 62, 28], [24, 45, 15, 30]])

stats.zscore(input_data)

Output:

array([-1.3916106 , -1.09379511, -0.49816411,  0.39528239, -0.20034861,
       -0.37903791, -0.55772721, -0.55772721,  1.28872889,  0.99091339,
        2.00348608])

Calculating the `z-score` for a `Pandas Dataframe` in Python

In this, we will use the randint() function of the NumPy library. This function is used to generate random sample numbers and store them in the form of a NumPy array. After creating the NumPy array, we will use that array as a Pandas Dataframe.

import pandas as pd
import numpy as np
import scipy.stats as stats

input_data = pd.DataFrame(
    np.random.randint(0, 30, size=(4, 4)), columns=["W", "X", "Y", "Z"]
)
print(input_data)

    W   X   Y   Z
0   7   9   2  15
1  11  23  15  28
2  28  11  25   2
3  11  19  14  15

input_data.apply(stats.zscore)

Output:

          W	        X	        Y	        Z
0	-0.894534	-1.135815	-1.471534	 0.000000
1	-0.400998	 1.310556	 0.122628	 1.414214
2	 1.696529	-0.786334	 1.348907	-1.414214
3	-0.400998	 0.611593	 0.000000	 0.000000

Note that apply() function of the Pandas library is used to calculate the z-score value for each value in the given dataframe. This function is used to apply a specific function defined as a function argument of the apply() function to each value of the Pandas series or dataframe.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Author: Lakshay Kapoor

Lakshay Kapoor is a final year B.Tech Computer Science student at Amity University Noida. He is familiar with programming languages and their real-world applications (Python/R/C++). Deeply interested in the area of Data Sciences and Machine Learning.

the scipy.stats.zscore Function

Calculating the z-score for a One-dimensional Array in Python

Calculating the z-score for a Multi-Dimensional Array in Python

Calculating the z-score for a Pandas Dataframe in Python

Related Article - SciPy Stats

the `scipy.stats.zscore` Function

Calculating the `z-score` for a `One-dimensional` Array in Python

Calculating the `z-score` for a Multi-Dimensional Array in Python

Calculating the `z-score` for a `Pandas Dataframe` in Python