Group 1 Dashboard

Introduction

This page contains summary results and visualizations of our study into consumer behavior. In order to gain insight into how big data can offer ecommerce companies greater opportunity to drive sales and revenue, we leveraged machine learning algorithms to analyze Amazon customer data.
Our research had 3 goals:

Develop a list of items frequently bought together
Create customer segments based on product categories purchased
Build a model to identify main topics included in the customer reviews of a product

Apriori Analysis

The Apriori algorithm is used for mining frequent item sets and relevant association rules from relational databases. The parameters “support” and “confidence” are utilized, support are the items’ frequency of occurrence and confidence is a conditional probability. The goal of the analysis is to identify items bought together and show them in the ecommerce website to increase cross sell and sales.

Segmentation Analysis

Based on data of eight different product categories; apparel, furniture, music, watches, personal care, office products, video and video games; the data was consolidated based on the product quantities bought from each segment by customer. The K-means cluster analysis was the machine learning used, since it is an unsupervised model that groups data into clusters, or in this case, customer segments.

Topic Analysis

For this analysis one specific product was selected, Product ID B000M0MJU2, an air mattress. The Latent Dirichlet Allocation (LDA) machine learning model was used to identify topics with the customer reviews. To better interpret the data, the analysis was split into bad (1-star) and good (5-stars) reviews.

The bubble charts below represent the output of the analysis, each bubble represents a different topic, the larger the bubble, the higher percentage of the number of reviews in the corpus of the topic. The blue bars show the overall frequency of each word in the corpus, if no topic is selected, the blue bars display the most frequently used words. The red bars give the estimated number of times a given term was generated by a given topic. The further the bubbles are away for each other, the more different they are.

Similar words between topics for good and bad reviews with different connotation
Analysis can be biased by person interpreting the outputs, hard to extract meaning of topics
Hard to identify different topics, similar words and feedback, recommended only for a superficial analysis
Need to improve corpus to combine words for more accurate analysis

Analyzing Consumer Behavior

Introduction

Apriori Analysis

Item Association by Segment

Segmentation Analysis

Results:

Insights:

Topic Analysis

5 Star Reviews

5 Star LDA

1 Star Reviews

1 Star LDA