- Maryam Asghari
- Email : masghari@bu.edu
predicting whether a given tweet is about a real disaster or not using pySpark by using ml libraries and by my own implementations
Source : https://www.kaggle.com/c/nlp-getting-started/data
- train.csv - the training set
- test.csv - the test set (Does not include labels)
- id - a unique identifier for each tweet
- text - the text of the tweet
- location - the location the tweet was sent from (may be blank)
- keyword - a particular keyword from the tweet (may be blank)
- target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)
Python files for functions that I used in The Notebooks
- Plots.py
- prep_ml.py
- prep_rdd.py
- nn_func.py
- LogisticRegression.py
- NaiveBayes.py
- SVM.py
- Trees.py
- RDD_logisticRegression.py
- LR_Optimizers.py
- RDD_SVM.py
- SVM_Optimizer.py
- RDD_NN.py
-
- Data visualization
- LogisticRegression
- NaiveBayes
- SVM
- Trees
- RDD_logisticRegression
- LR_Optimizers
- RDD_SVM
- SVM_Optimizer
-
- RDD_NN
spark-submit LogisticRegression.py './nlp-getting-started/train.csv'
spark-submit NaiveBayes.py './nlp-getting-started/train.csv'
spark-submit SVM.py './nlp-getting-started/train.csv'
spark-submit Trees.py './nlp-getting-started/train.csv'
NOTE: Following scripts need NLTK library
spark-submit RDD_logisticregression.py './nlp-getting-started/train.csv' './output_LR'
spark-submit RDD_svm.py './nlp-getting-started/train.csv' './output_svm'
spark-submit LR_Optimizers.py './nlp-getting-started/train.csv' './out/optimizer:'
spark-submit SVM_Optimizers.py './nlp-getting-started/train.csv' './out/optimizer2'
spark-submit RDD_NN.py './nlp-getting-started/train.csv' './out/NN_rdd'