This repository is the work for my second project from the Udacity Data Scientist Nanodegree Program. In this project, I applied data engineering skills to build an ETL pipeline to process the raw data then the data will go through an ML pipeline to classify data.
The classification model will help the people from disaster organizations classify the message into related categories so they can respond to the event more accurately and faster.
These are libraries that is used in this project:
- pandas
- numpy
- sklearn
.
├── README.md
├── app
│ ├── run.py # Flask file that runs app
│ └── template
│ ├── go.html # Classification result page of web app
│ └── master.html # Main page of web app
├── data
│ ├── DisasterResponse.db # Database to save clean data
│ ├── disaster_categories.csv # Input data to process
│ ├── disaster_messages.csv # Input data to process
│ └── process_data.py # ETL pipeline
├── models
│ └── train_classifier.py # ML pipeline
│ └── classifier.pkl # Saved model. Please run the ML pipeline to create this file.
└── notebook # notebook used for preparing the code
├── ETL Pipeline Preparation.ipynb
└── ML Pipeline Preparation.ipynb
-
Run the following commands in the project's root directory to set up your database and model.
-
To run ETL pipeline that cleans data and stores in database
python data/process_data.py \ data/disaster_messages.csv \ data/disaster_categories.csv \ data/DisasterResponse.db
-
To run ML pipeline that trains classifier and saves the model as pickle file
python models/train_classifier.py \ data/DisasterResponse.db \ models/classifier.pkl
-
-
Go to
app
directory:cd app
-
Run your web app:
python run.py
-
Go to
http://0.0.0.0:3000/
to access the website.
This project use disaster data from Appen (formally Figure 8).
The code is inspired by Udacity Data Scientist Nanodegree Program.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project.
- Create your Feature Branch (
git checkout -b feature/Feature
). - Commit your Changes (
git commit -m 'Add some feature'
). - Push to the Branch (
git push origin feature/Feature
). - Open a Pull Request.
- Huy Tran (dhuy237) - d.huy723@gmail.com