This repository is the work for my capstone project from the Udacity Data Scientist Nanodegree Program. In this project, I will analyze the data from Sparkify to predict customer churn.
Sparkify is a simulation data of a subscription-based company that provide music service like Spotify, Apple Music, etc. Customer churn prediction is a very challenging and common task for a data scientist or analyst to improve a company's business. Processing and analyzing a large amount of data with Spark is also a must-have skill in the data fields.
These are libraries that is used in this project:
- PySpark
- Install
PySpark
- Run the notebook
Sparkify.ipynb
The findings of this project has been published here.
This project use disaster data from Sparkify.
The code is inspired by Udacity Data Scientist Nanodegree Program.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project.
- Create your Feature Branch (
git checkout -b feature/Feature
). - Commit your Changes (
git commit -m 'Add some feature'
). - Push to the Branch (
git push origin feature/Feature
). - Open a Pull Request.
- Huy Tran (dhuy237) - d.huy723@gmail.com