SciPy scipy.stats.multivariate_normal
Python Scipy scipy.stats.multivariate_normal object is used to analyze the multivariate normal distribution and calculate different parameters related to the distribution using the different methods available.
Syntax to Gemerate Probability Density Function Using scipy.stats.multivariate_normal Object
scipy.stats.multivariate_normal.pdf(x, mean=None, cov=1, allow_singular=False)
Parameters:
x |
Values whose pdf is to be determined. The second dimension of this variable represents the components of the dataset. |
mean |
Array-like element that represents the mean of the distribution. Each value of the array represents the value for each component in the dataset. The default value is 0. |
cov |
Covariance Matrix of the data. The default value is 1. |
allow_singular |
If set to True, singular cov can be allowed. The default value is False |
Return:
An array-like structure which contains probability value for each element in x.
Example : Generate Probability Density Function Using scipy.stats.multivariate_normal.pdf Method
import numpy as np
from scipy.stats import multivariate_normal
mean = np.array([0.4, 0.8])
cov = np.array([[0.1, 0.3], [0.3, 1.0]])
x = np.random.uniform(size=(5, 2))
y = multivariate_normal.pdf(x, mean=mean, cov=cov)
print("Tha data and corresponding pdfs are:")
print("Data-------PDF value")
for i in range(len(x)):
print(x[i], end=" ")
print("------->", end=" ")
print(y[i], end="\n")
Output:
Tha data and corresponding pdfs are:
Data-------PDF value
[0.60156002 0.53917659] -------> 0.030687330659191728
[0.60307471 0.25205368] -------> 0.0016016741361277501
[0.27254519 0.06817383] -------> 0.7968146411119688
[0.33630808 0.21039553] -------> 0.7048988855032084
[0.0009666 0.52414497] -------> 0.010307396714783708
In the above example, x represents the array of values whose pdf is to be found. The rows represent each value of x whose pdf is to be found, and columns represent the number of components used to represent each value.
Here, each value of x consists of two components, and hence it is a vector of length 2. The mean will be a vector with a length equal to the number of components. Similarly, if d be the number of components in the dataset, cov will be a symmetric square matrix of size d*d.
The scipy.stats.multivariate_normal.pdf method takes the input x, mean and covariance matrix cov and outputs a vector with a length equal to the number of rows in x where each value in the output vector represents pdf value for each row in x.
Example : Draw Random Samples From a Multivariate Normal Distribution Using scipy.stats.multivariate_normal.rvs Method
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
mean = np.array([0.4, 0.8])
cov = np.array([[0.1, 0.3], [0.3, 1.0]])
x = multivariate_normal.rvs(mean, cov, 100)
plt.scatter(x[:, 0], x[:, 1])
plt.show()
Output:

The above plot represents the scatter plot of 20 random samples drawn randomly from a multivariate normal distribution with two features. The distribution has mean value of [0.4,0.8] where 0.4 represents the mean value of the first feature and 0.8 the mean of the second feature. We finally draw the scatter plot of random samples with the first feature along the X-axis and the second feature along the Y-axis.
From the plot, it is clear that most of the sample points are centered around [0.4,0.8], representing the multivariate distribution’s mean.
Example : Get Cumulative Distribution Function Using scipy.stats.multivariate_normal.cdf Method
Cumulative distribution function (CDF) is the integral of pdf.CDF shows us that any value taken from the population will have a probability value less than or equal to some value. We can calculate cdf of points of multivariate distribution using the scipy.stats.multivariate_normal.cdf method.
import numpy as np
from scipy.stats import multivariate_normal
mean = np.array([0.4, 0.8])
cov = np.array([[0.1, 0.3], [0.3, 1.0]])
x = np.random.uniform(size=(5, 2))
y = multivariate_normal.cdf(x, mean=mean, cov=cov)
print("Tha data and corresponding cdfs are:")
print("Data-------CDF value")
for i in range(len(x)):
print(x[i], end=" ")
print("------->", end=" ")
print(y[i], end="\n")
Output:
Tha data and corresponding cdfs are:
Data-------CDF value
[0.89027577 0.06036432] -------> 0.22976054289355996
[0.78164237 0.09611703] -------> 0.24075282906929418
[0.53051197 0.63041372] -------> 0.4309184323329717
[0.15571201 0.97173575] -------> 0.21985053519541042
[0.72988545 0.22477096] -------> 0.28256819625802715
In the above example, x represents the array of points at which cdf is to be found. The rows represent each value of x at which cdf is to be found, and columns represent the number of components used to represent each value.
Here, each value of x consists of two components, and hence it is a vector of length 2. The mean will be a vector with a length equal to the number of components. Similarly, if d be the number of components in the dataset, cov will be a symmetric square matrix of size d*d.
The scipy.stats.multivariate_normal.cdf method takes the input x, mean and covariance matrix cov and outputs a vector with a length equal to the number of rows in x where each value in the output vector represents cdf value for each row in x.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn