Skip to content

ArunErram/BigDataAnalysis

Repository files navigation

Big Data Analytics Project

Overview

This project focuses on performing big data analytics using a combination of MySQL, Pandas, Hadoop, Sqoop, Hive, and Tableau. The goal is to process and analyze large datasets efficiently, extracting meaningful insights for decision-making.

Technologies Used

  • MySQL: For storing and managing structured data.
  • Pandas: A Python library for data manipulation and analysis.
  • Hadoop: A distributed storage and processing framework for handling large-scale data.
  • Sqoop: Used for transferring data between Hadoop and relational databases like MySQL.
  • Hive: Enables querying and managing large datasets stored in Hadoop.
  • Tableau: A powerful data visualization tool for creating interactive and insightful dashboards.

Project Structure

  • Data Collection: Describes how data was collected, sources, and formats.
  • Data Processing: Explains the steps taken to clean, preprocess, and prepare the data.
  • Analytics: Details the analytical methods and techniques applied to extract insights.
  • Visualization: Demonstrates the visual representation of analytics results using Tableau.

Getting Started

Prerequisites

  • Install MySQL, Pandas, Hadoop, Sqoop, Hive, and Tableau as per project requirements.

Setup

  1. Clone the repository:

    git clone https://github.com/arunerram/BigDataAnalysis.git
    cd big-data-analytics

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published