ads

Apriori Algorithm

                    Apriori Algorithm *




The Apriori algorithm is a classic example of implementing association rule mining. It is used to identify frequent patterns and correlations between items in a dataset 




Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset properties. We apply an iterative approach or level-wise search where k-frequent itemsets are used to find k+1 itemsets.

To improve the efficiency of level-wise generation of frequent itemsets, an important property is used called Apriori property which helps by reducing the search space.


Apriori Property –
All non-empty subset of frequent itemset must be frequent. The key concept of Apriori algorithm is its anti-monotonicity of support measure. Apriori assumes that

All subsets of a frequent itemset must be frequent(Apriori property).
If an itemset is infrequent, all its supersets will be infrequent.


Consider the following dataset and we will find frequent itemsets and generate association rules for them.


                




                                                   minimum support count is 2

                                              minimum confidence is 50%



Step-1: K=1
(I) Create a table containing support count of each item present in dataset – Called C1(candidate set)


                                






(II) compare candidate set item’s support count with minimum support count(here min_support=2 if support_count of candidate set items is less than min_support then remove those items). This gives us itemset L1.





Step-2: K=2

  • Generate candidate set C2 using L1 (this is called join step). Condition of joining Lk-1 and Lk-1 is that it should have (K-2) elements in common.
  • Check all subsets of an itemset are frequent or not and if not frequent remove that itemset.(Example subset of{A, C} are {A}, {C} they are frequent.Check for each itemset)
  • Now find support count of these itemsets by searching in dataset.






    (II) compare candidate (C2) support count with minimum support count(here min_support=2 if support_count of candidate set item is less than min_support then remove those items) this gives us itemset L2.





    Step-3:
    • We stop here because no frequent itemsets are found further

      Thus, we have discovered all the frequent item-sets. Now generation of strong association rule comes into picture. For that we need to calculate confidence of each rule.
      Association Rule Generation :
      Association rule               support               confidence
          A  --> C                             2                     2/3 = 0.66*100 = 66%    C --> A                              2                     2/2 = 1*100  = 100%  
    So minimum confidence is 50%, then both the rules can be considered as strong association rules.



 Apriori Algorithm Implementing in Python *


For install apyori 

!pip install apyori


Collecting apyori Downloading apyori-1.1.2.tar.gz (8.6 kB) Preparing metadata (setup.py) ... done Building wheels for collected packages: apyori Building wheel for apyori (setup.py) ... done Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5955 sha256=102782e277a00b6395d05a4560398f764b2f33729a087e64a88b6cea5b9b5060 Stored in directory: /root/.cache/pip/wheels/c4/1a/79/20f55c470a50bb3702a8cb7c94d8ada15573538c7f4baebe2d Successfully built apyori Installing collected packages: apyori Successfully installed apyori-1.1.2

!pip install apriori


Collecting apriori Downloading apriori-1.0.0.tar.gz (1.8 kB) Preparing metadata (setup.py) ... done Building wheels for collected packages: apriori Building wheel for apriori (setup.py) ... done Created wheel for apriori: filename=apriori-1.0.0-py3-none-any.whl size=2454 sha256=6e2d6c3e6d0a4bf0564966e326e0448fb2f67522994347e2b747c297c3db2bfc Stored in directory: /root/.cache/pip/wheels/8c/fa/83/25b9cb17d884f97f2e62d97d0818bbed8117e89a6b09c37dc3 Successfully built apriori Installing collected packages: apriori Successfully installed apriori-1.0.0


Step 1: Import the libraries


import numpy as np
import pandas as pd
from apyori import apriori


Step 2: Load the dataset


store_data = pd.read_csv('ap.csv', header=None)


Step 3: Have a glance at the records


print(store_data)

    

   0 1 2 3 4 5 6 0 TID Itemsets NaN NaN NaN NaN NaN 1 T1 A B C NaN NaN NaN 2 T4 A NaN C NaN NaN NaN 3 T3 A NaN NaN D NaN NaN 4 T4 NaN B NaN NaN E F 5 NaN NaN NaN NaN NaN NaN NaN


Step 4: Look at the shape


store_data.shape


(9, 15)

Step 5: Convert Pandas DataFrame into a list of lists


#converting the pandas dataframe into a list of lists

records = []
for i in range(0,9):
  records.append([str(store_data.values[i,j])for j in range (0,15)])


Step 6: Build the Apriori model


#Building the apriory model

association_rules = apriori(records, min_support=0.22, min_confidence=0.66,
min_lift=3.0, min_length=2)
association_results = list(association_rules)


Step 7: Print out the number of rules


# getting number of rules

print(len(association_results))


2

Step 8: Have a glance at the rule


print(association_results)


[RelationRecord(items=frozenset({'A', 'C'}), support=0.2222222222222222, ordered_statistics=[OrderedStatistic(items_base=frozenset({'A'}),
items_add=frozenset({'C'}), confidence=0.6666666666666666, lift=3.0), OrderedStatistic(items_base=frozenset({'C'}), items_add=frozenset({'A'}), confidence=1.0, lift=3.0)])



Limitations of Apriori Algorithm

Despite being a simple one, Apriori algorithms have some limitations including:

  • Waste of time when it comes to handling a large number of candidates with frequent itemsets.
  • The efficiency of this algorithm goes down when there is a large number of transactions going on through a limited memory capacity. 
  • Required high computation power and need to scan the entire database. 














Post a Comment

0 Comments
* Please Don't Spam Here. All the Comments are Reviewed by Admin.

#buttons=(Ok, Go it!) #days=(20)

Our website uses cookies to enhance your experience. Learn More
Ok, Go it!