Skip to content

kabbina/Big-Data

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Big-Data

Team

Guhan Kabbina
Harshita Vidapanakal
Hanuraag Baskaran
Rohan M

Project

This repository contains source code for the following projects:

1] Analysis of Earth Surface Temperature using Spark

2] Implementation of Page Rank Algorithm with Embeddings for Wikipedia using Hadoop

3] Analysis of US Road Accident Data using Hadoop

4] Classification of Spam and Ham Emails using Spark Machine Learning



Usage

Step 1 :

Run the script files present in the config folder.

To Install both Hadoop and Spark on your Linux machine

Step 2 :

Run the requirements script files present in the config folder.

To install all the required libraries for all the projects in this repository

Step 3 :

The required data files for all the projects is present in the data folder.

The data files are pre-processed and a sample of the data is stored, but the link for the entire dataset is provided in the data\README.md file.

Step 4 :

The source code for all the projects is present in the src folder.

PLEASE READ THE DOCUMENTATION AND REPORT TO UNDERSTAND THE WORKING OF THE CODE

Step 5 :

Run the respective script files present in the tools folder for each project.

Step 6 :

The output for each project is present in the sample folder.

Step 7 :

Pre-Trained models for Spam_Ham_Classification are present in the build folder to be used for the classification of the emails using the test src\Spam_Ham\models\model_test.py file.

Conclusion

The peformance analysis of the models in the projects is provided in the report\images folder.

All the additional details regarding the project are provided in the docs folder.

Please raise a Github issue if you have any questions or suggestions.

Releases

No releases published

Packages

No packages published

Languages

  • Python 47.9%
  • Shell 28.6%
  • Jupyter Notebook 23.5%