How to Perform Chi-Square Test in Python
The Chi-square test is used to determine independence between two categorical data variables. We will perform this test in Python using the SciPy
module in this tutorial.
We will use the chi2_contingency()
function from the SciPy module to perform the test. Let us start by importing the SciPy
module.
Perform Chi-Square Test in Python
Import SciPy:
from scipy.stats import chi2_contingency
The chi2_contingency
function takes a contingency table in the 2D format as an input. A contingency table is used in statistics to summarize the relationship between categorical variables.
So let us create this contingency table.
data = [[207, 282, 241], [234, 242, 232]]
Let us pass this array to the function.
stat, p, dof1, expected = chi2_contingency(data)
The chi2_contingency()
function will return a tuple containing test statistics, the p-value, degrees of freedom, and the expected table. We will compare the obtained p-value with the alpha value of 0.05.
Let’s now interpret the p-value using the below code.
alpha = 0.05
print("p val is " + str(p))
if p <= alpha:
print("Dependent")
else:
print("Independent")
The output for the above code would be:
p val is 0.1031971404730939
Independent
If the p-value is greater than the alpha value, which is 0.05, both variables are not significantly related and can be considered independent.
In our case, we have a p-value greater than alpha, and therefore we can conclude that both our variables are independent. Therefore, we can perform the chi-square test in Python using the above technique.