Results
This section reports the results of market basket analysis of the data from a UK-based online retailer. The analysis was performed to discover the co-occurrence relationships among customers’ purchase activities, such as the likelihood to purchase a candle holder if the customer already has candles and matches in their shopping cart. Such an analysis is commonly used in marketing to increase the chance of cross-selling, provide recommendations to customers (e.g., based on their browsing history) and deliver targeted marketing (e.g., offer coupons to customers for products that are frequently purchased together with items that the customers recently bought).
The analysis is described in the previous section.
The analysis for this project was performed in R. For an introduction to working with the arules package in R please refer to this tutorial.
A frequency plot showing the number of purchases of the top 10 items (using the dataset in transactions format) is shown below:
The items on the plot are ordered by frequency of purchases. The order is the same as in Table 2, the summary statistics which we discussed in the Data Preparation section. This confirms that the conversion from data frame format to transactions format was done correctly.
The summary of rules generated by the apriori algorithm is shown by executing summary(rules)
:
set of 354 rules
rule length distribution (lhs + rhs):sizes
3 4 5 6 7
25 87 166 74 2
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.000 4.000 5.000 4.833 5.000 7.000
summary of quality measures:
support confidence lift count
Min. :0.005038 Min. :0.9500 Min. : 8.274 Min. : 91.0
1st Qu.:0.005204 1st Qu.:0.9569 1st Qu.:17.021 1st Qu.: 94.0
Median :0.005481 Median :0.9626 Median :20.885 Median : 99.0
Mean :0.006325 Mean :0.9652 Mean :33.120 Mean :114.2
3rd Qu.:0.006616 3rd Qu.:0.9712 3rd Qu.:46.161 3rd Qu.:119.5
Max. :0.011239 Max. :1.0000 Max. :95.520 Max. :203.0
mining info:
data ntransactions support confidence
trans 18062 0.005 0.95
The above summary provides several interesting insights. There are a total of 354 rules with the support ranging from 0.005 (as per the minimum support that we specified in the previous section) to 0.11; confidence ranging from 0.95 (again, as per the minimum confidence that we specified) to 1.00; and lift ranging from 8.3 to 95.5.
The confidence of 0.95 and above for all the rules implies that if a transaction contains items on the left-hand-side, the probability that the same transaction will contain the items on the right-hand-side is 95% or higher.
The lift of substantially higher than 1 for all the rules implies that the items on the left-hand-side substantially increase the likelihood of the items on the right-hand-side to be purchased in the same transaction.
The output below shows the top 10 rules in terms of support (i.e., the probability that a consumer will purchase items on the left-hand-side):
lhs rhs support confidence lift count
[1] {HERB MARKER PARSLEY,HERB MARKER THYME} => {HERB MARKER ROSEMARY} 0.01123907 0.9575472 72.36492 203
[2] {HERB MARKER BASIL,HERB MARKER THYME} => {HERB MARKER ROSEMARY} 0.01107297 0.9523810 71.97450 200
[3] {HERB MARKER MINT,HERB MARKER THYME} => {HERB MARKER ROSEMARY} 0.01090688 0.9563107 72.27148 197
[4] {HERB MARKER PARSLEY,HERB MARKER MINT,HERB MARKER THYME} => {HERB MARKER ROSEMARY} 0.01029786 0.9587629 72.45680 186
[5] {HERB MARKER PARSLEY,HERB MARKER ROSEMARY,HERB MARKER MINT} => {HERB MARKER THYME} 0.01029786 0.9538462 73.00156 186
[6] {HERB MARKER PARSLEY,HERB MARKER BASIL,HERB MARKER THYME} => {HERB MARKER ROSEMARY} 0.01029786 0.9637306 72.83222 186
[7] {HERB MARKER PARSLEY,HERB MARKER ROSEMARY,HERB MARKER BASIL} => {HERB MARKER THYME} 0.01029786 0.9637306 73.75806 186
[8] {HERB MARKER MINT,HERB MARKER CHIVES } => {HERB MARKER PARSLEY} 0.01007640 0.9629630 73.69931 182
[9] {HERB MARKER PARSLEY,HERB MARKER CHIVES } => {HERB MARKER MINT} 0.01007640 0.9528796 72.01218 182
[10] {HERB MARKER THYME,HERB MARKER CHIVES } => {HERB MARKER PARSLEY} 0.01007640 0.9680851 74.09133 182
The rules make intuitive sense. For example, it is reasonable to expect that customers (many of whom are wholesalers) who purchase “HERB MARKER PARSLEY” and “HERB MARKER THYME” would also purchase “HERB MARKER ROSEMARY”, as such customers appear to be generally interested in herb makers.
Finally, a plot of the rules that we have identified is shown below:
We can see that both confidence and lift tend to grow as support increases. We can also see an interesting pattern showing clusters of confidence as concave functions of support, which warrants further examination.
Overall, the chart suggests that the likelihood of items on the right-hand-side to be purchased is particularly high in the transactions in which the items appearing on the left-hand-side are among the most frequently occurring combinations.
Previous step: Analysis