Skip to content

Commit

Permalink
Merge pull request #568 from Avdhesh-Varshney/ielts
Browse files Browse the repository at this point in the history
IELTS Success Analysis and Prediction Model
  • Loading branch information
abhisheks008 authored Feb 8, 2024
2 parents af81812 + 9c8a0eb commit ca765e1
Show file tree
Hide file tree
Showing 9 changed files with 121 additions and 0 deletions.
33 changes: 33 additions & 0 deletions IELTS Success Analysis and Prediction/Dataset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# IELTS Success Stories Dataset

The Dataset used here is taken from the Kaggle database website. You can download the file from the link given here, [IELTS Success Stories Dataset](https://www.kaggle.com/datasets/zakirkhanaleemi/ielts-success-stories-dataset)

## About the dataset

- There are 27 rows / entries in this dataset.
- There are 23 different features which are listed below:

- Candidate: Name or identifier of the individual who took the IELTS test.
- Location: The city or region where the candidate is located.
- Profession: The candidate's occupation or field of work/study.
- Study Duration (months): The duration, in months, that the candidate spent preparing for the IELTS test.
- IELTS Score (Overall): The overall band score achieved by the candidate in the IELTS test.
- Key Strategies: Strategies and methods employed by the candidate during their IELTS preparation.
- Education Level: The highest level of education attained by the candidate (e.g., Bachelor's, Master's).
- Age: The age of the candidate at the time of taking the IELTS test.
- Target Country: The country the candidate aspires to move to or pursue further studies in.
- English Proficiency (Preparation): The candidate's self-assessed English proficiency level before starting IELTS preparation.
- Practice Hours per Week: The average number of hours per week the candidate dedicated to IELTS practice.
- Mock Tests Taken: The number of practice/mock IELTS tests taken by the candidate.
- Achieved Desired Score: Indicates whether the candidate achieved their target IELTS score.
- Preferred Learning Resources: The materials or resources the candidate favored during their IELTS preparation.
- Application Status: The status of the candidate's application for further studies or immigration.
- Job Offer Received: Indicates whether the candidate received a job offer in their target country.
- Additional Certifications: Any additional certifications or qualifications attained by the candidate.
- Volunteer Experience: Whether the candidate has relevant volunteer experience.
- Language Fluency: The candidate's fluency in languages other than English.
- Internship Experience: Whether the candidate has relevant internship experience.
- Relevant Skills: Skills possessed by the candidate that are relevant to their profession or studies.
- Recommendations: The strength of recommendations provided for the candidate.
- Networking Efforts: Efforts made by the candidate to network within their field or community.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

80 changes: 80 additions & 0 deletions IELTS Success Analysis and Prediction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<h1>IELTS Success Stories Analysis and Prediction Model</h1>

**GOAL**

The aim of this project is to analyze and predict the success rates of IELTS.

**DATASET**

https://www.kaggle.com/datasets/zakirkhanaleemi/ielts-success-stories-dataset

**DESCRIPTION**

To analyze the IELTS Success Stories Dataset and build and train the model on the basis of different features and variables.


### Visualization and EDA of different attributes:

<img alt="heatmap" src="./Images/correlation_heatmap.png">

<img alt="graph" src="./Images/target_correlation.png">

<img alt="graph" src="./Images/Application Status_feature.png">

<img alt="graph" src="./Images/Location_feature.png">

<img alt="graph" src="./Images/Study Duration (months)_feature.png">


**MODELS USED**

| Model | MSE_train | R2_train | MSE_test | R2_test |
|-----------------------------|---------------------|----------|-----------|-----------|
| Random Forest Regression | 7.79e-03 | 0.977 | 0.0151 | 0.9257 |
| XG Boost Regression | 1.42e-07 | 1.000 | 0.0165 | 0.919 |
| Decision Tree Regression | 0.000 | 1.000 | 0.0208 | 0.8974 |
| Ridge Regression | 6.44e-04 | 0.998 | 0.0723 | 0.6439 |
| Elastic Net Regression | 9.25e-02 | 0.727 | 0.1335 | 0.3428 |
| Linear Regression | 4.13e-30 | 1.000 | 0.154 | 0.2418 |
| KNN Regression | 1.01e-01 | 0.703 | 0.1683 | 0.1713 |



**WHAT I HAD DONE**

* Load the dataset which contains 27 entries in it and having 23 features in it.
* Checked for missing values and cleaned the data accordingly.
* Analyzed the data, found insights and visualized them accordingly.
* Plotting heatmap using correlation and checking the relation between different features.
* Found detailed insights of different columns with target variable using plotting libraries and plot the box-plot to see the distribution of dataset correspond to target features.
* Split the dataset into training and testing dataset.
* Apply PCA to reduce the number of features.
* Apply different training models and get their accuracies and MSE and R2 scores.
* Train the datasets by different models and saves their accuracies into a dataframe.


**LIBRARIES NEEDED**

1. Pandas
2. Matplotlib
3. Sklearn
4. NumPy
5. XGBoost
6. Tensorflow
7. Keras
8. Sci-py
9. Seaborn


**CONCLUSION**

- Random Forest and XG Boost Regression models show promising performance with lower MSE and higher R2 values.
- Decision Tree Regression achieved perfect R2 on the training set but performed poorly on the test set, indicating overfitting.


**YOUR NAME**

*Avdhesh Varshney*

[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/avdhesh-varshney/) [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Avdhesh-Varshney)

7 changes: 7 additions & 0 deletions IELTS Success Analysis and Prediction/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
numpy==1.19.2
pandas==1.4.3
matplotlib==3.7.1
scikit-learn~=1.0.2
scipy==1.5.0
seaborn==0.10.1
xgboost~=1.5.2

0 comments on commit ca765e1

Please sign in to comment.