Skip to content

Collection of continuous learning and growth in the world of data analytics. Lifelong Learner! ๐Ÿš€

License

Notifications You must be signed in to change notification settings

haajar-es/data-analytics-dojo

Repository files navigation

Data Analytics Dojo ๐Ÿš€

Welcome to the Data Analytics Dojo! This repository is dedicated to continuous learning and growth in the world of data analytics. This dojo serves as a collection of assignments, projects, tutorials, and case studies covering a broad range of topics and tools.

Overview

This repository is a personal journey of learning and exploring various data analytics topics, from exploratory data analysis (EDA) and data visualization to advanced topics like outlier detection and building data pipelines. It includes practical applications of popular tools and languages such as:

  • SQL
  • Excel
  • Python
  • Jupyter Notebooks

The goal of this dojo is to provide a structured and collaborative environment for developing and sharpening data analytics skills over time. As I continue learning, more advanced content will be added, covering various aspects of data engineering, machine learning, business intelligence, and more.

Repository Structure

The repository is organized into folders, each representing a different project or area of study:

  1. Analyze International Debt Statistics

    • In this project, I've performed exploratory data analysis (EDA) for international debt data collected by The World Bank.
    • Skills applied: SQL, Python, Pandas, Data Cleaning
    • Key concepts: Analyzing the global debt landscape, highlighting the scale of international debt and identifying key countries and debt categories that dominate the global economic scenario.
  2. European Soccer Data Manipulation

    • The case study showcases skills in data manipulation, complex queries, and window functions using SQL
    • Skills applied: SQL, Data Manipulation (CTEs, CASE Statements, JOIN operations, WINDOW functions)
    • Key concepts: This analysis demonstrates proficiency in using advanced SQL techniques to extract meaningful insights from complex datasets.
  3. Exploring Hacker New Posts

    • In this project, I've performed exploratory data analysis (EDA) on Hacker News posts.
    • Skills applied: Python, Pandas, Data Cleaning
    • Key concepts: Handling large datasets, analyzing trends in online posts.
  4. Hacker News Pipeline

    • This folder contains a data pipeline built for ingesting and analyzing Hacker News posts.
    • Skills applied: SQL, Data Pipelines, ETL, Python, Data Extraction and Transformation.
    • Key concepts: Automating the collection of data, building efficient data pipelines for continuous updates.
  5. Outliers Detection

    • This project focuses on detecting outliers in healthcare datasets.
    • Skills applied: Python (Pandas, NumPy, Seaborn, Matplotlib, Scipy, sklearn), Outlier Detection Techniques
    • Key concepts: Identifying anomalies, handling skewed data distributions, applying machine learning to identified outliers.
  6. Student Performance Analysis

    • An analysis of student performance based on various factors.
    • Skills applied: EDA, Data Visualization, Statistical Analysis
    • Key concepts: Creating visual reports to track performance trends, insights into education data.
  7. Northwind Traders EDA

    • Performed comprehensive data analysis on Northwind Traders, an international gourmet food distributor, examining sales patterns, customer behavior, and product performance.
    • Skills applied: SQL, Data Manipulation (CTEs, WINDOW functions), EDA, Data Visualization
    • Key concepts: Analysis of sales metrics, customer behavior, and product performance to optimize business operations and identify growth opportunities in a gourmet food distribution company.

Technical Skills Covered

This repository is designed to apply and improve skills in various tools and languages essential for data analytics, including but not limited to:

  • Exploratory Data Analysis (EDA)
  • Python (Pandas, NumPy, Matplotlib, Plotly, Seaborn)
  • SQL (for querying and transforming data)
  • Data Pipelines (Building automated workflows)
  • Data Visualization (using Matplotlib, Seaborn, Power BI)
  • Excel (for quick data manipulation and visualization)
  • Outlier Detection and handling of messy, real-world data.

Acknowledgements

Credits to the course instructors (if any), and any resources used.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

โœจ Feel free to explore this repository. Lifelong Learner!

About

Collection of continuous learning and growth in the world of data analytics. Lifelong Learner! ๐Ÿš€

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published