Skip to content

Classifies textual communications from various sources during disasters for emergency response and relief agencies to act upon.

Notifications You must be signed in to change notification settings

joshasgard/Disaster-Response-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Disaster Response Relief Pipeline πŸ”₯

A message classifier for detecting victims of disasters and those in emergency situations.
Built with Supervised Learning and Natural Language Processing!

Python Flask jQuery BootStrap 4.0 Plotly
Scikit-Learn SQLite

If you want to discuss more about the project, then you can reach out via mail πŸ“©.

Summary 🎯

A web app for an ML pipeline trained and deployed to help emergency responders detect public messages (tweets, FB posts, texts, etc.) asking for aid during disasters and emergencies. Victims and witnesses can also use the app to check the relevant agencies to call upon.

Data

  • The disaster response data contains tweets and text messages gathered by FigureEight Inc. after major disasters and labelled into 36 different categories of victim needs.
  • The raw data files are two .csv files containing the messages and categories respectively. About 26400 of the messages were given here.

ETL Pipeline

  • The data cleaning process was done in an ETL pipeline. It involves data extraction from .csv, merging, and data transformation by string splitting, type casting, duplicate removal and filtering. The clean data is then loaded into an SQlite DB.

ML Pipeline

  • The ML pipeline, with a Random Forest Classifier (RF) as estimator (AdaBoost Classifier for the deployed app), is fitted to the dataset here.
  • Our clean messages serve as the predictor variable while the 36 categories are the MultiOutput target from the pipeline. Hyperparameter Tuning using Grid Search (with Cross Validation) was carried out to find the hyperparameters giving the optimum model performance.
  • During the modelling process, a Naive Bayes Classifier and an AdaBoost Classifier were also fitted to the dataset. But there is no significant difference in their performance compared to the RF classifier. Hence, we stuck with the latter.
  • The pipeline dumps the trained model to a pickle file and returns model Precision, Recall and Accuracy as performance metrics.

Web App

  • The web app allows a user to input a single message as a sentence and the trained model returns the disaster categories the message falls with a few relevant agencies to call upon.

Deployment

  • View deployed app here

File Description πŸ“‚

Structure

- app
| - static
| | - img
| | | - - # all images files (.jpg, .png, and .svg)
| - template
| |- master.html  # main page of web app
| |- databoard.html # data visualizations board
| |- results.html  # classification result page of web app
|- run.py  # Flask file that runs app

- data
|- disaster_categories.csv  # data to process 
|- disaster_messages.csv  # data to process
|- process_data.py  # ETL pipeline script
|- DisasterResponse.db  # database to save clean data to
|- model_metrics.csv  # trained model metrics saved for visualization

- models
|- train_classifier.py  # ML pipeline script
|- classifier.pkl  # saved model. Not available in repo

- README.md

- .gitignore

Use Repo πŸ“’βœοΈ

Run the following commands in the project's root directory in your command line to set up your database and model.

ETL data to SQLite DB

  • To run ETL pipeline that cleans data and stores in database. python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db

Train Model in ML Pipeline

  • To run ML pipeline that trains classifier and saves. The training should take a few minutes (or hours) depending on your machine. python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Training Result
Loading data...
    DATABASE: data/DisasterResponse.db
Building model...
Training model...
...Training Time: 8112.252200126648 seconds ---
Evaluating model...
                        precision    recall  f1-score   support

               related       0.83      0.96      0.89      3992
               request       0.84      0.49      0.62       888
                 offer       0.00      0.00      0.00        25
           aid_related       0.78      0.67      0.72      2145
          medical_help       0.45      0.04      0.07       398
      medical_products       0.76      0.07      0.14       254
     search_and_rescue       0.73      0.08      0.14       139
              security       0.33      0.01      0.02       107
              military       0.38      0.02      0.04       160
           child_alone       0.00      0.00      0.00         0
                 water       0.85      0.45      0.58       331
                  food       0.84      0.61      0.71       594
               shelter       0.79      0.40      0.53       454
              clothing       1.00      0.08      0.15        97
                 money       0.60      0.03      0.05       119
        missing_people       1.00      0.01      0.03        67
              refugees       0.67      0.04      0.08       192
                 death       0.85      0.19      0.31       245
             other_aid       0.62      0.04      0.07       683
infrastructure_related       0.33      0.00      0.01       350
             transport       0.64      0.04      0.07       241
             buildings       0.75      0.16      0.26       246
           electricity       0.67      0.02      0.04       101
                 tools       0.00      0.00      0.00        30
             hospitals       0.00      0.00      0.00        51
                 shops       0.00      0.00      0.00        29
           aid_centers       0.00      0.00      0.00        72
  other_infrastructure       0.00      0.00      0.00       235
       weather_related       0.87      0.62      0.73      1459
                floods       0.90      0.32      0.48       422
                 storm       0.82      0.36      0.50       499
                  fire       0.00      0.00      0.00        57
            earthquake       0.92      0.71      0.80       514
                  cold       0.57      0.04      0.08        90
         other_weather       0.50      0.03      0.06       278
         direct_report       0.81      0.36      0.50      1009

             micro avg       0.82      0.51      0.63     16573
             macro avg       0.56      0.19      0.24     16573
          weighted avg       0.75      0.51      0.56     16573
           samples avg       0.68      0.47      0.51     16573

Saving model...
    MODEL: models/classifier.pkl
Trained model saved!

Run web app

  • Run the following command in the app's directory to run your web app. python run.py or run with Flask
C:\Desktop\Disaster-Response-App\app> python run.py
     * Serving Flask app 'run' (lazy loading)
     * Environment: development
     * Debug mode: on
     * Restarting with stat
     * Debugger is active!
     * Debugger PIN: 431-044-507
     * Running on all addresses.
       WARNING: This is a development server. Do not use it in a production deployment.
     * Running on http://192.168.43.52:3001/ (Press CTRL+C to quit)
     ......

Imbalanced Data βš–οΈ

Imbalanced Data

  • The dataset is heavily imbalanced for most of the message categories, particularly important ones like 'water', 'fire', 'missing people', 'death', 'hospital', 'electricity', and 'transport' have very few representation.
  • Even with Stratified Sampling through Grid Search Cross Validation to maintain class weights, our model still fails to detect messages for these categories. Hence, we have high or low precisions and/or recalls for the categories..

Emphasizing Precision and/or Recall

  • Precision, calculated as TP/(TP + FP), is a model metric that punishes classification of messages into a category they do not belong (False Positves, returns 1 instead of 0). The higher the number of misclassifications in that category, the lower the precision value of the model for the category and vice-versa. Recall, calculated as the true positive rate TP/(TP + FN), is a model performance metric that helps to measure the failure of a model to detect messages that belong to a category (False Negatives, returns 0 instead of 1).

  • We have a small positive to negative class ratio for most of the disaster category in the dataset, which supports our interest of detecting relevant messages (positive class) out of a multitude in a disaster.

  • **Hence, for each of the 36 categories, our choice of metric depends on whether it is more costly for our model to detect some wrong messages (precision) or fail to detect some messages of people that need emergency help (Recall) or both (f1 score). In this case, we can afford to detect some irrelevant messages for most categories, but we can't afford to miss messages of people that need help. **

Licence πŸ“ƒ

The MIT License (MIT)

Copyright (c) 2021 Idowu Odesanmi. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE..

Credits πŸš€

About

Classifies textual communications from various sources during disasters for emergency response and relief agencies to act upon.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published