How to Plot CDF Matplotlib Python
This tutorial explains how we can generate a CDF
plot using the Matplotlib in Python. CDF
is the function whose y-values represent the probability that a random variable will take the values smaller than or equal to the corresponding x-value.
Plot CDF Using Matplotlib in Python
CDF is defined for both continuous and discrete probability distributions. In continuous probability distribution, the random variable can take any value from the specified range, but in the discrete probability distribution, we can only have a specified set of values.
Plot CDF for Discrete Distribution Using Matplotlib in Python
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(1, 7)
y = [0.2, 0.1, 0.1, 0.2, 0.1, 0.3]
cdf = np.cumsum(y)
plt.plot(x, y, marker="o", label="PMF")
plt.plot(x, cdf, marker="o", label="CDF")
plt.xlim(0, 7)
plt.ylim(0, 1.5)
plt.xlabel("X")
plt.ylabel("Probability Values")
plt.title("CDF for discrete distribution")
plt.legend()
plt.show()
Output:
It plots the PMF
and CDF
for the given distribution. To calculate the y-values for CDF
, we use the numpy.cumsum()
method to calculate an array’s cumulative sum.
If we are given frequency counts, we must normalize the y-values initially so that they represent the PDF
.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(1, 7)
frequency = np.array([3, 8, 4, 5, 3, 6])
pdf = frequency / np.sum(frequency)
cdf = np.cumsum(pdf)
plt.plot(x, pdf, marker="o", label="PMF")
plt.plot(x, cdf, marker="o", label="CDF")
plt.xlim(0, 7)
plt.ylim(0, 1.5)
plt.xlabel("X")
plt.ylabel("Probability Values")
plt.title("CDF for discrete distribution")
plt.legend()
plt.show()
Output:
Here, we are given the frequency values for each X value. We convert the frequency values into pdf
values by dividing each element of the pdf
array by the sum of frequencies. We then use the pdf
to calculate the CDF
values to plot the CDF
of given data.
We can also use histogram plots to view the CDF
and PDF
plots, which will be more intuitive for discrete data.
import numpy as np
import matplotlib.pyplot as plt
data = [3, 4, 2, 3, 4, 5, 4, 7, 8, 5, 4, 6, 2, 1, 0, 9, 7, 6, 6, 5, 4]
plt.hist(data, bins=9, density=True)
plt.hist(data, bins=9, density=True, cumulative=True, label="CDF", histtype="step")
plt.xlabel("X")
plt.ylabel("Probability")
plt.xticks(np.arange(0, 10))
plt.title("CDF using Histogram Plot")
plt.show()
Output:
It plots the CDF
and PDF
of given data using the hist()
method. To plot the CDF
, we set cumulative=True
and set density=True
to get a histogram representing probability values that sum to 1.
Plot CDF
for Continuous Distribution Using Matplotlib in Python
import numpy as np
import matplotlib.pyplot as plt
dx = 0.005
x = np.arange(-10, 10, dx)
y = 0.25 * np.exp((-(x ** 2)) / 8)
y = y / (np.sum(dx * y))
cdf = np.cumsum(y * dx)
plt.plot(x, y, label="pdf")
plt.plot(x, cdf, label="cdf")
plt.xlabel("X")
plt.ylabel("Probability Values")
plt.title("CDF for continuous distribution")
plt.legend()
plt.show()
Output:
It plots the PMF
and CDF
for the given continuous distribution. To calculate the y-values for CDF
, we use the numpy.cumsum()
method to calculate an array’s cumulative sum.
We divide y
by the sum of the array y
multiplied by the dx
to normalize the values so that the CDF
values range from 0 to 1.
Suraj Joshi is a backend software engineer at Matrice.ai.
LinkedIn