This Analysis is made as a proposal for SportsWear Group. The aim of this proposal is to leverage advanced analytics in boosting the marketing campaign of Sports Wear Group and to demonstrate the value of utilizing our platform in improving marketing efficiency and driving sales growth.
- Importing Needed Libraries
- Loading Data into Pandas
- Data Understanding, Problems, and Solutions
- Exploratory Data Analysis (Whole Data) a. Nulls b. Outliers c. Class Imbalance d. Correlation and Relevant Features Selection
- Creating a New Dataset with Relevant Features a. Handling Duplicate Rows with Different Labels b. Handling Class Imbalance (Using Majority Vote)
- Exploratory Data Analysis (Cleaned and Processed Data) a. Feature Engineering b. Analysis i. Relations with Target ii. Relations Between Features iii. Relations Across Time
- Questions and Insights
- Benefits to Business and Recommendations Summary
- Preprocessing a. Checking Multicollinearity b. Features Selection c. Data Transformations d. Categorical Encoding e. SMOTE for Class Imbalance
- Model Training
- Model Evaluation
The provided case study was approached through a systematic and comprehensive analysis process. Validation and exploratory data analysis were performed on the entire dataset, leading to the selection of relevant features based on correlation analysis. A new dataset was created, taking into account the handling of duplicate rows with different labels and addressing class imbalance using the majority vote approach.
Further exploratory data analysis was conducted on the cleaned and processed data, uncovering valuable insights into the relationships with the target variable, as well as the relationships between features and across time. These insights provided a deeper understanding of the data and helped answer key questions and gain clear and reliable insights.
Preprocessing steps involved checking for multicollinearity, selecting informative features, performing data transformations, and applying categorical encoding. To address the class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was utilized.
A logistic regression model was trained on the preprocessed data and evaluated using appropriate metrics. The model demonstrated strong performance with an F1 score of 91%.
Overall, this analysis has provided valuable insights and actionable recommendations for the business. The documented findings and recommendations can guide marketing strategies and highlight the benefits of leveraging advanced analytics.