Big Data Analytics Project

This repository contains the code and documentation for the Big Data Analytics module (UEL-CN-7031) project.

Project Structure

data/: Contains the dataset files.
notebooks/: Jupyter notebooks for each task.
scripts/: Shell and Python scripts for data processing and analysis.
reports/: Final report and presentation files.
visuals/: Visualizations and plots generated during the analysis.
docs/: Additional documentation.

Tasks

Understanding Dataset
Big Data Query & Analysis by Apache Hive
Advanced Analytics using PySpark
Individual Assessment

Setup

Clone the repository:

git clone https://github.com/Kyeyuneashiraf/big-data-analytics-project.git
cd big-data-analytics-project

Follow the instructions in the notebooks/ directory to execute the tasks.

Usage

Run the shell script to load data into HDFS:
```
./scripts/load_data_to_hdfs.sh
```
Execute the Hive queries:
```
hive -f scripts/hive_queries.sql
```

Run the PySpark analysis:

spark-submit scripts/pyspark_analytics.py

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Analytics Project

Project Structure

Tasks

Setup

Usage

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
docs		docs
notebooks		notebooks
reports		reports
scripts		scripts
visuals		visuals
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

Kyeyuneashiraf/big-data-analytics-project

Folders and files

Latest commit

History

Repository files navigation

Big Data Analytics Project

Project Structure

Tasks

Setup

Usage

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages