Results

This section reports the results of market basket analysis of the data from a UK-based online retailer. The analysis was performed to discover the co-occurrence relationships among customers’ purchase activities, such as the likelihood to purchase a candle holder if the customer already has candles and matches in their shopping cart. Such an analysis is commonly used in marketing to increase the chance of cross-selling, provide recommendations to customers (e.g., based on their browsing history) and deliver targeted marketing (e.g., offer coupons to customers for products that are frequently purchased together with items that the customers recently bought).

The analysis is described in the previous section.

The analysis for this project was performed in R. For an introduction to working with the arules package in R please refer to this tutorial.

A frequency plot showing the number of purchases of the top 10 items (using the dataset in transactions format) is shown below:

The items on the plot are ordered by frequency of purchases. The order is the same as in Table 2, the summary statistics which we discussed in the Data Preparation section. This confirms that the conversion from data frame format to transactions format was done correctly.

The summary of rules generated by the apriori algorithm is shown by executing summary(rules):

set of 354 rules

rule length distribution (lhs + rhs):sizes
  3   4   5   6   7 
 25  87 166  74   2 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  3.000   4.000   5.000   4.833   5.000   7.000 

summary of quality measures:
    support           confidence          lift            count      
 Min.   :0.005038   Min.   :0.9500   Min.   : 8.274   Min.   : 91.0  
 1st Qu.:0.005204   1st Qu.:0.9569   1st Qu.:17.021   1st Qu.: 94.0  
 Median :0.005481   Median :0.9626   Median :20.885   Median : 99.0  
 Mean   :0.006325   Mean   :0.9652   Mean   :33.120   Mean   :114.2  
 3rd Qu.:0.006616   3rd Qu.:0.9712   3rd Qu.:46.161   3rd Qu.:119.5  
 Max.   :0.011239   Max.   :1.0000   Max.   :95.520   Max.   :203.0  

mining info:
  data ntransactions support confidence
 trans         18062   0.005       0.95

The above summary provides several interesting insights. There are a total of 354 rules with the support ranging from 0.005 (as per the minimum support that we specified in the previous section) to 0.11; confidence ranging from 0.95 (again, as per the minimum confidence that we specified) to 1.00; and lift ranging from 8.3 to 95.5.

The confidence of 0.95 and above for all the rules implies that if a transaction contains items on the left-hand-side, the probability that the same transaction will contain the items on the right-hand-side is 95% or higher.

The lift of substantially higher than 1 for all the rules implies that the items on the left-hand-side substantially increase the likelihood of the items on the right-hand-side to be purchased in the same transaction.

The output below shows the top 10 rules in terms of support (i.e., the probability that a consumer will purchase items on the left-hand-side):

     lhs                                                             rhs                    support    confidence lift     count
[1]  {HERB MARKER PARSLEY,HERB MARKER THYME}                      => {HERB MARKER ROSEMARY} 0.01123907 0.9575472  72.36492 203  
[2]  {HERB MARKER BASIL,HERB MARKER THYME}                        => {HERB MARKER ROSEMARY} 0.01107297 0.9523810  71.97450 200  
[3]  {HERB MARKER MINT,HERB MARKER THYME}                         => {HERB MARKER ROSEMARY} 0.01090688 0.9563107  72.27148 197  
[4]  {HERB MARKER PARSLEY,HERB MARKER MINT,HERB MARKER THYME}     => {HERB MARKER ROSEMARY} 0.01029786 0.9587629  72.45680 186  
[5]  {HERB MARKER PARSLEY,HERB MARKER ROSEMARY,HERB MARKER MINT}  => {HERB MARKER THYME}    0.01029786 0.9538462  73.00156 186  
[6]  {HERB MARKER PARSLEY,HERB MARKER BASIL,HERB MARKER THYME}    => {HERB MARKER ROSEMARY} 0.01029786 0.9637306  72.83222 186  
[7]  {HERB MARKER PARSLEY,HERB MARKER ROSEMARY,HERB MARKER BASIL} => {HERB MARKER THYME}    0.01029786 0.9637306  73.75806 186  
[8]  {HERB MARKER MINT,HERB MARKER CHIVES }                       => {HERB MARKER PARSLEY}  0.01007640 0.9629630  73.69931 182  
[9]  {HERB MARKER PARSLEY,HERB MARKER CHIVES }                    => {HERB MARKER MINT}     0.01007640 0.9528796  72.01218 182  
[10] {HERB MARKER THYME,HERB MARKER CHIVES }                      => {HERB MARKER PARSLEY}  0.01007640 0.9680851  74.09133 182  

The rules make intuitive sense. For example, it is reasonable to expect that customers (many of whom are wholesalers) who purchase “HERB MARKER PARSLEY” and “HERB MARKER THYME” would also purchase “HERB MARKER ROSEMARY”, as such customers appear to be generally interested in herb makers.

Finally, a plot of the rules that we have identified is shown below:

We can see that both confidence and lift tend to grow as support increases. We can also see an interesting pattern showing clusters of confidence as concave functions of support, which warrants further examination.

Overall, the chart suggests that the likelihood of items on the right-hand-side to be purchased is particularly high in the transactions in which the items appearing on the left-hand-side are among the most frequently occurring combinations.

Previous step: Analysis