This project is part of my master thesis research. I am conduting research on consumer behaviour with clickstream dataset from a Large Multi-category eCommerce website.
The goal of this research is to extract consumers online shopping behavioural patterns from clickstream data, utilize unsupervised machine learning (leader clustering) to segment the customers into different clusters based on their behavioural patterns, and to conduct exploratory analysis for each cluster.
Additionally, logistic regression will be used to model the likelihood of purchase for the customers, using the extracted behavioural patterns as predictors.
While having logistic regression as the benchmark, different machine learning algorithms will be explored, particularly: Random Forest (RF), Extreme Gradient Boosted Method (XGBoost), Support Vector Machine Method (SVM) and Recurrent Neural Network (RNN).
Finally, the best performing machine learning method will be reported, and the novel Explanability of A.I (XAI) will be explored to interprete the feature importance of blackbox machine learning prediction.
The dataset is a raw click stream log file of consumers clicking pattern. It contains of 20,000,000 (20million) user events.