* Apriori Algorithm *
The Apriori algorithm is a classic example of implementing association rule mining. It is used to identify frequent patterns and correlations between items in a dataset
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. We apply an iterative approach or level-wise search where k-frequent itemsets are used to find k+1 itemsets.
To improve the efficiency of level-wise generation of frequent itemsets, an important property is used called Apriori property which helps by reducing the search space.
Apriori Property –
All non-empty subset of frequent itemset must be frequent. The key concept of Apriori algorithm is its anti-monotonicity of support measure. Apriori assumes that
All subsets of a frequent itemset must be frequent(Apriori property).
If an itemset is infrequent, all its supersets will be infrequent.
Consider the following dataset and we will find frequent itemsets and generate association rules for them.
minimum support count is 2 minimum confidence is 50%
Step-1: K=1
(I) Create a table containing support count of each item present in dataset – Called C1(candidate set)
(II) compare candidate set item’s support count with minimum support count(here min_support=2 if support_count of candidate set items is less than min_support then remove those items). This gives us itemset L1.
Step-2: K=2
- Generate candidate set C2 using L1 (this is called join step). Condition of joining Lk-1 and Lk-1 is that it should have (K-2) elements in common.
- Check all subsets of an itemset are frequent or not and if not frequent remove that itemset.(Example subset of{A, C} are {A}, {C} they are frequent.Check for each itemset)
- Now find support count of these itemsets by searching in dataset.
- We stop here because no frequent itemsets are found further
(II) compare candidate (C2) support count with minimum support count(here min_support=2 if support_count of candidate set item is less than min_support then remove those items) this gives us itemset L2.
Step-3:
- Thus, we have discovered all the frequent item-sets. Now generation of strong association rule comes into picture. For that we need to calculate confidence of each rule.
- Association Rule Generation :
- Association rule support confidence
- A --> C 2 2/3 = 0.66*100 = 66% C --> A 2 2/2 = 1*100 = 100%
For install apyori
Step 1: Import the libraries
Step 2: Load the dataset
Step 3: Have a glance at the records
Step 4: Look at the shape
Step 5: Convert Pandas DataFrame into a list of lists
Step 6: Build the Apriori model
Step 7: Print out the number of rules
Step 8: Have a glance at the rule
Limitations of Apriori Algorithm
Despite being a simple one, Apriori algorithms have some limitations including:
- Waste of time when it comes to handling a large number of candidates with frequent itemsets.
- The efficiency of this algorithm goes down when there is a large number of transactions going on through a limited memory capacity.
- Required high computation power and need to scan the entire database.