Skip to content

A simple machine learning approach to detect the sender based on the mail body of the famous Enron Datasset

License

Notifications You must be signed in to change notification settings

nowshad-sust/enron-sender-detection

Repository files navigation

enron-sender-detection

A simple machine learning approach to detect the sender based on the mail body of the famous Enron Datasset

prerequisites

  • python
  • anaconda
  • scikit learn
  • other dependencies

How to run

  1. clone this repository - git clone https://github.com/nowshad-sust/enron-sender-detection.git
  2. download enron dataset from here - https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz
  3. now extract this dataset(maildir) to the project(clonned) folder
  4. create a folder named remail in the project directory
  5. open a terminal or cmd in the project directory
  6. run the copy_sent_mails.py script by the command - python copy_sent_mails.py This should make a directory named remail in the project folder and copy all the sent mails from the original dataset directory.
  7. Now, run the naive_bayes_pipeline.py by the command - python naive_bayes_pipeline.py This should give you a number which refers to the validation sucess rate.

Latest Statistics (accuracy)

  • Naive Bayes classifier ~ 0.46
  • SVM ~ 0.79
  • SVM with grid search ~ 0.85

About

A simple machine learning approach to detect the sender based on the mail body of the famous Enron Datasset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages