Prologue

Earlier my career I worked for a loyalty card scheme for one of the largest supermarkets in the UK. During my time I was involved in building a recommendation system for products for the selected members of the loyalty programme. The subject of this post will be understanding recommendation systems and seeing how a novel example works in R. When I was building my first recommendation system I managed to scrape through various R tutorials and with help from engineers we put a model into production. However I didn’t have a clear understanding of these methods until I came across Brett Lantz’s Machine Learning with R book. The technical parts of this post are based off of a chapter in his book and I have expanded on parts which I think need further explanation.

Recommendation systems are popular nowadays and every company worth their salt has some sort of recommendation system to encourage customers to buy products which are deemed complementary to the products they have already purchased. The most famous example is Amazon where you have a whole suite of products you may be interested in as you look through the website. Although Amazon’s recommender system is far more advanced than what will be covered here, the foundational ideas are the same. In Lantz’s book the term “recommendation system” is called Market Basket Analysis and that is the term I will use in this post. 

Introduction

Using machine learning to learn purchasing patterns from transactional data is called Market Basket Analysis. In any given transaction, a customer may purchase one or more product or items. A group of one or more items is called an itemset. Transactions are specified in terms of itemsets; in a grocery store, an itemset could be

$$\{ \text{bread, peanut butter, jam} \}$$

Market Basket Analysis will output a collection of association rules which specify patterns found in the relationships among items in the itemset. An association rule is composed of subsets from itemsets and relates one itemset on the left hand side (LHS) of the association rule to another itemset on the right hand side (RHS) of the association rule is

(1) \{ \text{peanut butter, jam} \} \rightarrow \{ \text{bread} \}

The LHS of the association rule (1) is the condition which needs to be satisfied in order to trigger the rule or obtain the RHS. Rule (1) states that if peanut butter and jam are purchased together then bread is also likely to be purchased. 

These types of association rule learners are unsupervised, the data do not require class labels and the rule learner does not need to be trained on some subset of the data – it can be released on the full dataset.

The Apriori Algorithm

Computational Cost

When using an algorithm to generate association rules from transactional data, we must consider the high computational cost involved with interrogating large datasets. Transactional datasets are typically very large, both in terms of the number of transactions and in the number of items purchased. An example of a transactional dataset is

Transaction_idItemset (Product Purchased)
1$$\{A\}$$
2$$\{A,B,C\}$$
3$$\{A,E,F\}$$
N$$\{A,F,B\}$$

The main problem with transactional data is that the number of potential itemsets grows exponentially with the number of distinct items available. Given k items that are available which could or could not appear in an itemset, there are 2^k – 1 possible itemsets that could be potential association rules. E.g. Suppose we have k = 3 items, \{ \{A\}, \{B\}, \{C\} \} then the itemset possibilities are:

$$\{A\}, \{B\}, \{C\}, \{A,B\}, \{A,C\}, \{B,C\}, \{A,B,C\}$$

If k is very high, such as k > 100, then it will be a very computationally expensive task to evaluate all the potential itemsets that could be made. 

The Apriori Property – Efficiently Evaluating Itemsets

Instead of evaluating each potential itemset successively, a smarter association rule learning approach recognises the fact that many potential itemsets are rarely found in practice. By ignoring rare or less important combinations, it is possible to limit the scope of the search for association rules to a more manageable size.

The Apriori algorithm is one approach to reduce the number of itemsets to evaluate. The algorithm utilises a prior belief about the properties of frequent itemsets – hence the name Apriori. 

The prior belief used in the Apriori algorithm is called the Apriori Property and it’s function is to reduce the association rule subspace. The Apriori Property states that:

(2) \text{Apriori Property} = \text{All subsets of a frequent itemset must also be frequent}

As an example, the itemset 

$$\{Crisps, Pepsi\}$$

can only be considered a frequent itemset if both subsets \{Crisps\} and \{Pepsi\} are themselves classed as frequent. Conversely, if either \{Crisps\} or \{Pepsi\} are not frequent, then any itemset containing those items can be excluded from further consideration.

Understanding Rule Frequency

To classify a prospective association rule to be frequent we require 2 statistical measures:

Support – The support of an itemset is a measure of how frequently the itemset occurs in the data. Formally, the support for an itemset X is defined as

(3) Support(X) = \frac{Count(X)}{N}

where N is the total number of transactions in the database, and Count(X) is the number of transactions containing itemset X.

Confidence – Confidence is a measurement of accuracy. It is defined as

(4) Confidence(X \rightarrow Y) = \frac{Support(X,Y)}{Support(X)}

Confidence tells us the proportion of transactions where the presence of itemset X results in the presence of itemset Y. Note: Confidence(X \rightarrow Y) is not necessarily the same as Confidence(Y \rightarrow X).

There is an additional measure which is useful to compute when running the Apriori algorithm. 

Lift – The lift of a rule measures how much more likely an itemset is purchased relative to the typical rate of purchase, given another itemset has been purchased. It is defined as

(5) Lift(X \rightarrow Y) = \frac{Confidence(X \rightarrow Y)}{Support(Y)}

Note: Lift(X \rightarrow Y) = Lift(Y \rightarrow X) because

Lift(X \rightarrow Y) = \frac{Confidence(X \rightarrow Y)}{Support(Y)} = \frac{Support(X,Y)}{Support(X)} \times \frac{1}{Support(Y)} = \frac{Support(X,Y)}{Support(X)Support(Y)}

Lift(Y \rightarrow X) = \frac{Confidence(Y \rightarrow X)}{Support(X)} = \frac{Support(Y,X)}{Support(Y)} \times \frac{1}{Support(X)} = \frac{Support(Y,X)}{Support(Y)Support(X)} = \frac{Support(X,Y)}{Support(X)Support(Y)}

If Lift(X \rightarrow Y) > 1, it implies that the 2 itemsets (X,Y) are found together more often than one would expect by chance. A large Lift value is a strong indicator that an association rule is important and reflects a true connection between the items. 

Example 1: Consider transactional data from a hospital gift shop

Transaction NumberPurchased Items
1Flowers, Card, Soda
2Toy, Flowers, Balloons, Candy Bar
3Card, Candy Bar, Flowers
4Toy, Balloons, Soda
5Flowers, Card, Soda

1) What is the support of the itemset \{Card, Flowers\}?

2) Show that Confidence(Flowers \rightarrow Card) \neq Confidence(Card \rightarrow Flower)

Solution:

1) Support(\{Card, Flowers\}) = \frac{Count(\{Card, Flowers\})}{5} = \frac{3}{5}

2) Confidence(Flowers \rightarrow Card) = \frac{Support(\{Flowers, Card\})}{Support(\{Flowers\})} = \frac{0.6}{0.8} = \frac{3}{4}

 Confidence(Card \rightarrow Flowers) = \frac{Support(\{Card, Flowers\})}{Support(\{Card\})} = \frac{0.6}{0.6} = 1

The above implies that a purchase involving flowers, is accompanied by a card 75% of the time. However a purchase involving a card, is accompanied with flowers 100% of the time. 

Building an Association Ruleset with the Apriori Algorithm

In order to obtain a set of association rules algorithmically, there are 2 phases in the process:

  1. Construct and identify all itemsets which meet a predefined minimum support threshold. These itemsets will be built from single items and combined successively based on if the item meets a minimum support threshold. 
  2. Once the itemsets from phase 1 are determined, we create association rules from the itemsets. The association rules considered will be those that meet a minimum confidence threshold. 

Phase 1: Identifying Frequent Itemsets

The first phase occurs in multiple iterations and each successive iteration uses the results obtained from the previous iteration. Each successive iteration involves evaluating the support of a set of increasingly larger itemsets against a minimum support threshold. The structure of each iteration can be summarised as:

Iteration 1: Evaluate support for itemsets containing 1 item and compare against the minimum support threshold. If the itemset meets the minimum support threshold, consider it for the next iteration.

Iteration 2: Take the i-item itemsets which met the minimum support threshold from iteration 1 and construct 2 item itemsets from them. Evaluate the support for these itemsets and compare against the minimum support threshold. If the itemset meets the minimum support threshold, consider it for the next iteration.

Iteration n – 1: Take the individual itemsets which met the minimum support threshold from iteration n – 1 and construct n item itemsets from them. Evaluate the support for these itemsets and compare against the minimum support threshold. If the itemset meets the minimum support threshold, consider it for phase 2. 

Terminate the iterative process when no new itemsets can be generated. 

Each iteration operates in conjunction with the Apriori Principle which allows the elimination of some candidate itemsets before the subsequent iteration. E.g. If \{A\}, \{B\}, \{C\} are itemsets which meet a minimum support threshold but itemset \{D\} does not, then the subsequent iteration will only consider itemsets \{A,B\}, \{A,C\}, \{B,C\}. Moreover, in the subsequent iteration, it is found that \{A,B\} and \{B,C\} meet the minimum support threshold but \{A,C\} does not. Then, by the Apriori principle, the itemset \{A,B,C\} does not need to be considered because a subset \{A,C\} does not meet the minimum support threshold. Hence, having generated no new itemsets, the iterative process of phase 1 can stop. 

Phase 2: Create Association Rules from Itemsets

Given the set of itemsets which meet the minimum support threshold from Phase 1, candidate association rules are generated from all possible subsets. For instance, the itemset \{A,B,C\} will generate candidate association rules

  1. \{A\} \rightarrow \{B,C\}
  2. \{B\} \rightarrow \{A,C\}
  3. \{C\} \rightarrow \{A,B\}
  4. \{A,B\} \rightarrow \{C\}
  5. \{A,C\} \rightarrow \{B\}
  6. \{B,C\} \rightarrow \{A\}

These candidate rules are evaluated against a minimum confidence threshold, and any rule that does not meet the minimum confidence level is eliminated. The association rules which meet the minimum confidence threshold form the basis of Market Basket Analysis. 

Worked Example – Building Association Rules with Apriori

Consider the following table of transactions

Transaction IDItems
1$$\{I1, I2, I5\}$$
2$$\{I2, I4\}$$
3$$\{I2, I3\}$$
4$$\{I1, I2, I4\}$$
5$$\{I1, I3\}$$
6$$\{I2, I3\}$$
7$$\{I1, I3\}$$
8$$\{I1, I2, I3, I5\}$$
9$$\{I1, I2, I3\}$$

We want to find association rules where our thresholds are:

  1. Minimum Support count = 2 (Use this for simplicity of calculation, they are defined as Count(X))
  2. Minimum Confidence level = 50%

Phase 1: Identifying Frequent Itemsets

Iteration 1

ItemsetSupport CountFrequent (Yes = 1, No = 0)
$$\{I1\}$$61
$$\{I2\}$$71
$$\{I3\}$$61
$$\{I4\}$$21
$$\{I5\}$$21

Iteration 2

ItemsetSupport CountFrequent (Yes = 1, No = 0)
$$\{I1, I2\}$$41
$$\{I1, I3\}$$41
$$\{I1, I4\}$$10
$$\{I1, I5\}$$21
$$\{I2, I3\}$$41
$$\{I2, I4\}$$21
$$\{I2, I5\}$$21
$$\{I3, I4\}$$00
$$\{I3, I5\}$$10
$$\{I4, I5\}$$00

Iteration 3

ItemsetSupport CountFrequent (Yes = 1, No = 0)
$$\{I1, I2, I3\}$$21
$$\{I1, I2, I5\}$$21

We cannot make itemsets with more elements without violation the Apriori Principle. The potential itemset
$$\{I1, I2, I3, I5\}$$
cannot be used because the subset \{I3, I5\} was found to be infrequent in iteration 2.
From phase 1, we consider the itemsets:

  1. $$\{I1, I2, I3\}$$
  2. $$\{I1, I2, I5\}$$

Phase 2: Create Association Rules from Itemsets

Our candidate rules with their associated confidence measures are:

1) \{I1, I2\} \rightarrow \{I3\}: \frac{Support(\{I1, I2, I3\})}{Support(\{I1, I2\})} = \frac{2}{4} = 50\%

2) \{I1, I3\} \rightarrow \{I2\}: \frac{Support(\{I1, I2, I3\})}{Support(\{I1, I3\})} = \frac{2}{4} = 50\%

3) \{I2, I3\} \rightarrow \{I1\}: \frac{Support(\{I1, I2, I3\})}{Support(\{I2, I3\})} = \frac{2}{4} = 50\%

4) \{I1\} \rightarrow \{I2, I3\}: \frac{Support(\{I1, I2, I3\})}{Support(\{I1\})} = \frac{2}{6} = 33\%

5) \{I2\} \rightarrow \{I1, I3\}: \frac{Support(\{I1, I2, I3\})}{Support(\{I2\})} = \frac{2}{7} = 29\%

6) \{I3\} \rightarrow \{I1, I2\}: \frac{Support(\{I1, I2, I3\})}{Support(\{I3\})} = \frac{2}{6} = 33\%

7) \{I1, I2\} \rightarrow \{I5\}: \frac{Support(\{I1, I2, I5\})}{Support(\{I1, I2\})} = \frac{2}{4} = 50\%

8) \{I1, I5\} \rightarrow \{I2\}: \frac{Support(\{I1, I2, I5\})}{Support(\{I1, I5\})} = \frac{2}{2} = 100\%

9) \{I2, I5\} \rightarrow \{I1\}: \frac{Support(\{I1, I2, I5\})}{Support(\{I2, I5\})} = \frac{2}{2} = 100\%

10) \{I1\} \rightarrow \{I2, I5\}: \frac{Support(\{I1, I2, I5\})}{Support(\{I1\})} = \frac{2}{6} = 33\%

11) \{I2\} \rightarrow \{I1, I5\}: \frac{Support(\{I1, I2, I5\})}{Support(\{I2\})} = \frac{2}{7} = 29\%

12) \{I5\} \rightarrow \{I1, I2\}: \frac{Support(\{I1, I2, I5\})}{Support(\{I5\})} = \frac{2}{2} = 100\%

Using the minimum confidence threshold of 50%, the association rules generated are the rules numbered 1, 2, 3, 7, 8, 9, 12.

Fitting Apriori Algorithm in R

# The Apriori Algorithm 
library(arules)
library(readr)
library(varhandle)
library(dplyr)

# 1) Load the Groceries dataset - since it is a transaction table, we need to store it as a sparse matrix.
## Data can be cloned from git at: https://github.com/stedy/Machine-Learning-with-R-datasets
## Dataset name is: "groceries.csv"
groceries <- read.transactions(file = "groceries.csv", sep = ",")

## Summarise and visualise the transaction table
summary(groceries)

## To see the contents of the sparse matrix, use the inspect() function
inspect(groceries)

## To examine a particular item (that is, a column of data), it is possible use the [row, column] matrix notion. Using this with the itemFrequency() function allows us to see the proportion of transactions that contain the item.

itemFrequency(groceries[, 1:11])

## Visualise item support with item frequency plots
## We can specify which items by seeing how many meet a minimum support threshold

itemFrequencyPlot(groceries, support = 0.1)

## We can limit the plot to a specific number of items

itemFrequencyPlot(groceries, topN = 20)

## Visualise the transaction data - plot the sparse matrix
image(groceries[1:5])

# 2) Fitting the Apriori model to data

''' 
We need to decide on appropriate minimum thresholds for support and confidence, if we start with a conservative minimum confidence value
then we can reduce it to broaden the search if we cannot find actionable rules.
We begin with a confidence threshold of 0.25 - in order to be included in the results, the rule has to be correct at least 25 percent of the time.
It is also helpful to set minleng = n so we can eliminate rules that contain fewer than n items. 
'''

groceryrules <- apriori(groceries, parameter = list(support = 0.006, confidence = 0.25, minlen = 2))

# Evaluate Model Performance

## View a summary of association rules
summary(groceryrules)

## To look at the specific association rules
inspect(groceryrules[1:5])

''' 
The first rule can be read as "if a customer buys potted plants, they will also buy whole milk". 
With the support of 0.007 and confidence of 0.4, we can determine that this rule covers 0.7 percent of 
all the transactions and is correct in 40 percent of purchases involving potted plants.
'''

## The lift value tells us how much more likely a customer is to buy whole milk relative to the average customer, given that they bought a potted plant.

''' 
Many association rules found in the data may be trivial or nonesensical - we want to extract those rules that make sense and are actionable
The most useful rules for the analysis might be ones with the highest support, confidence or lift. 
'''

## We can use the sort function to order the rules
inspect(sort(groceryrules, by = "lift")[1:5])

# 3) Taking subsets of association rules

## We can subset rules containing an item of interest

## Find rules associated with berries
berryrules <- subset(groceryrules, items %in% "berries")
inspect(berryrules)

## Find rules partially matching vegetables
vegerules <- subset(groceryrules, items %pin% "vegetables")
inspect(sort(vegerules, by = "lift")[1:5])

## Find rules completely matching berries and yoghurt
yogberrules <- subset(groceryrules, items %ain% c("berries", "yogurt"))
inspect(sort(yogberrules, by = "lift"))

## Find rules which have confidence of 60% and over
highconfrules <- subset(groceryrules, confidence >= 0.6)
inspect(highconfrules[1:8])

# 4) Saving association rules to a file or data frame

## As a dataframe
groceryrules_df <- as(groceryrules, "data.frame")

## Write to CSV
write(groceryrules, file = "groceryrules.csv", sep = ",", quote = TRUE, row.names = FALSE)

References

  1. Machine Learning with R. Brett Lantz. (Buy the book!) https://www.amazon.co.uk/Machine-Learning-techniques-predictive-modeling/dp/1788295862/ref=pd_sbs_14_t_0/261-6349059-5820955?_encoding=UTF8&pd_rd_i=1788295862&pd_rd_r=0d580b64-1b3f-482e-a773-990cfaae73a4&pd_rd_w=6yQOv&pd_rd_wg=ZLwCp&pf_rd_p=e44592b5-e56d-44c2-a4f9-dbdc09b29395&pf_rd_r=P10S192NN37SACQH1MGJ&psc=1&refRID=P10S192NN37SACQH1MGJ
  2. The worked apriori example was adapted from https://www.geeksforgeeks.org/apriori-algorithm/

Leave a Reply