Python Apriori Algorithm
- Explanation of the Apriori Algorithm
- Apriori Algorithm in Python
- Implement the Topological Sort Algorithm in Python
data:image/s3,"s3://crabby-images/658e8/658e89f070dbffbdb2cce7fbcea18d95dc3b7813" alt="Python Apriori Algorithm"
This tutorial will discuss the implementation of the apriori algorithm in Python.
Explanation of the Apriori Algorithm
The Apriori Algorithm is widely used for market basket analysis, i.e., to analyze which items are sold and which other items. This is a useful algorithm for shop owners who want to increase their sales by placing the items sold together close to each other or offering discounts.
This algorithm states that if an itemset is frequent, all non-empty subsets must also be frequent. Let’s look at a small example to help illustrate this notion.
Let’s say that in our store, milk, butter, and bread are frequently sold together. This implies that milk, butter, and milk, bread, and butter, bread are also frequently sold together.
The Apriori Algorithm also states that the frequency of an itemset can never exceed the frequency of its non-empty subsets. We can further illustrate this by expanding a little more on our previous example.
In our store, milk, butter, and bread are sold together 3 times. This implies that all of its non-empty subsets like milk, butter, and milk, bread, and butter, bread are sold together at least 3 times or more.
Apriori Algorithm in Python
Before implementing this algorithm, we need to understand how the apriori algorithm works.
At the start of the algorithm, we specify the support threshold. The support threshold is just the probability of the occurrence of an item in a transaction.
$$
Support(A) =(Number of Transactions Containing the item A) / (Total Number of Transactions)
$$
Apart from support, there are other measures like confidence and lift, but we don’t need to worry about those in this tutorial.
The steps we need to follow to implement the apriori algorithm are listed below.
- Our algorithm starts with just a
1-itemset
. Here, 1 means the number of items in our itemset. - Removes all the items from our data that do not meet the minimum support requirement.
- Now, our algorithm increases the number of items (
k
) in our itemset and repeats steps 1 and 2 until the specifiedk
is reached or there are no itemsets that meet the minimum support requirements.
Implement the Topological Sort Algorithm in Python
To implement the Apriori Algorithm, we will be using the apyori
module of Python. It is an external module, and hence we need to install it separately.
The pip
command to install the apyori
module is below.
pip install apyori
We’ll be using the Market Basket Optimization dataset from Kaggle.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori
We have imported all the libraries required for our operations in the code given above. Now, we need to read the dataset using pandas
.
This has been implemented in the following code snippet.
market_data = pd.read_csv("Market_Basket_Optimisation.csv", header=None)
Now, let’s check the total number of transactions in our dataset.
len(market_data)
In the above code, we initialized the list transacts
and stored our transactions of length 20 in it. The issue here is that we insert null values inside transactions with fewer than 20 items.
But we don’t have to worry about it because the apriori
module handles null values automatically.
We now generate association rules from our data with the apriori
class constructor. This is demonstrated in the following code block.
rules = apriori(
transactions=transacts,
min_support=0.003,
min_confidence=0.2,
min_lift=3,
min_length=2,
max_length=2,
)
We specified our thresholds for the constructor’s minimum support, confidence, and lift thresholds. We also specified the minimum and the maximum number of items in an itemset to be 2, i.e., we want to generate pairs of items that were frequently sold together.
The apriori algorithm’s association rules are stored inside the rules
generator object. We now need a mechanism to convert this rules
into a pandas
dataframe.
The following code snippet shows a function inspect()
that takes the generator object rules
returned by our apriori()
constructor and converts it into a pandas
dataframe.
def inspect(output):
Left_Hand_Side = [tuple(result[2][0][0])[0] for result in output]
support = [result[1] for result in output]
confidence = [result[2][0][2] for result in output]
lift = [result[2][0][3] for result in output]
Right_Hand_Side = [tuple(result[2][0][1])[0] for result in output]
return list(zip(Left_Hand_Side, support, confidence, lift, Right_Hand_Side))
output = list(rules)
output_data = pd.DataFrame(
inspect(output),
columns=["Left_Hand_Side", "Support", "Confidence", "Lift", "Right_Hand_Side"],
)
print(output_data)