Disaster Response Pipeline Project

Data Scientist Nanodegree

Data Engineering

Project: Disaster Response Pipeline

1. Project Overview

This project is part of the Data Science Nanodegree Program by Udacity in collaboration with Figure Eight. The initial dataset contains pre-labeled real messages from real-life disaster events. The aim of the project is to build a Natural Language Processing tool that categorizes messages.

The project is divided into the following sections:

Data Processing, using ETL Pipeline to extract data from the source, clean data and store it in a database
Machine Learning Pipeline to train a model to classify messages to categories
Web App to show model results intuitively in real-time

2. Project Components

There are three components of this project:

2.1. ETL Pipeline

File data/process_data.py:

Loads the messages and categories dataset
Merges and cleans the data
Stores data in a SQLite database

2.2. Machine Learning Pipeline

File models/train_classifier.py:

Loads data from SQLite database
Creates training and test datasets
Builds and Trains on a text processing and machine learning pipeline
Uses GridSearchCV to optimize hyperparameters
Score on test dataset
Exports final model as a pickle file

2.3. Flask Web App

File app/run.py:

Run python run.py from app directory to start the web app where users can enter their messages, i.e., message sent during a disaster. App classifies messages into categories.

3. Running

There are three steps to get up and runnning with the web app if you want to start from ETL process.

3.1. Data Cleaning

Run the following command to execute ETL pipeline that cleans and stores data in database:

python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db

The first two arguments are input data and the third argument is the output SQLite Database name to store the cleaned data.

3.2. Training Classifier

Run the following command to execute ML pipeline that trains classifier and saves:

python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl

Loads data from SQLite database to train the model and save the model to a pickle file.

3.3. Starting the web app

Run the following command in the app's directory to run the web app.

python run.py

This will start the web app and will direct you to http://0.0.0.0:3001/ where you can enter messages and get classification results.

4. Screenshots

Information regarding training data set can be seen on main page of web app

Below is an example of a message to test ML model performance

Clicking Classify Message, will highlight the relevant text categories

5. Software Requirements

Python 3.7
Machine Learning Libraries: NumPy, Pandas, Sciki-Learn, pickle
Natural Language Process Libraries: NLTK, re
SQLlite Database Libraqries: SQLalchemy
Web App and Data Visualization: Flask, Plotly
python related: sys,warnings

File Descriptions

There are three main foleders:

data

disaster_categories.csv: dataset contains categories
disaster_messages.csv: dataset contains messages
process_data.py: ETL pipeline scripts to load, clean, merge and store data into a database
DisasterResponse.db: SQLite database containing processed messages and categories data

models

train_classifier.py: machine learning pipeline scripts to train, and save a model
classifier.pkl: saved model in pkl format

app

run.py: Python script to integrate all above files and to start the web application
templates contains html file for the web applicatin
run-windows.py: Python script to integrate all above files and to start the web application on windows machine

6. Credits and Acknowledgements

Udacity for curating a program with intensive projects.
Figure Eight for providing dataset used in the project.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
app		app
data		data
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
Capture-Front.PNG		Capture-Front.PNG
Capture-Front2.PNG		Capture-Front2.PNG
Capture-Message1.PNG		Capture-Message1.PNG
Capture-Message2.PNG		Capture-Message2.PNG
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Response Pipeline Project

Data Scientist Nanodegree

Data Engineering

Project: Disaster Response Pipeline

Table of Contents

1. Project Overview