This project focuses on performing big data analytics using a combination of MySQL, Pandas, Hadoop, Sqoop, Hive, and Tableau. The goal is to process and analyze large datasets efficiently, extracting meaningful insights for decision-making.
- MySQL: For storing and managing structured data.
- Pandas: A Python library for data manipulation and analysis.
- Hadoop: A distributed storage and processing framework for handling large-scale data.
- Sqoop: Used for transferring data between Hadoop and relational databases like MySQL.
- Hive: Enables querying and managing large datasets stored in Hadoop.
- Tableau: A powerful data visualization tool for creating interactive and insightful dashboards.
- Data Collection: Describes how data was collected, sources, and formats.
- Data Processing: Explains the steps taken to clean, preprocess, and prepare the data.
- Analytics: Details the analytical methods and techniques applied to extract insights.
- Visualization: Demonstrates the visual representation of analytics results using Tableau.
- Install MySQL, Pandas, Hadoop, Sqoop, Hive, and Tableau as per project requirements.
-
Clone the repository:
git clone https://github.com/arunerram/BigDataAnalysis.git cd big-data-analytics