Welcome to the Data Analytics Dojo! This repository is dedicated to continuous learning and growth in the world of data analytics. This dojo serves as a collection of assignments, projects, tutorials, and case studies covering a broad range of topics and tools.
This repository is a personal journey of learning and exploring various data analytics topics, from exploratory data analysis (EDA) and data visualization to advanced topics like outlier detection and building data pipelines. It includes practical applications of popular tools and languages such as:
- SQL
- Excel
- Python
- Jupyter Notebooks
The goal of this dojo is to provide a structured and collaborative environment for developing and sharpening data analytics skills over time. As I continue learning, more advanced content will be added, covering various aspects of data engineering, machine learning, business intelligence, and more.
The repository is organized into folders, each representing a different project or area of study:
-
Analyze International Debt Statistics
- In this project, I've performed exploratory data analysis (EDA) for international debt data collected by The World Bank.
- Skills applied: SQL, Python, Pandas, Data Cleaning
- Key concepts: Analyzing the global debt landscape, highlighting the scale of international debt and identifying key countries and debt categories that dominate the global economic scenario.
-
European Soccer Data Manipulation
- The case study showcases skills in data manipulation, complex queries, and window functions using SQL
- Skills applied: SQL, Data Manipulation (CTEs, CASE Statements, JOIN operations, WINDOW functions)
- Key concepts: This analysis demonstrates proficiency in using advanced SQL techniques to extract meaningful insights from complex datasets.
-
- In this project, I've performed exploratory data analysis (EDA) on Hacker News posts.
- Skills applied: Python, Pandas, Data Cleaning
- Key concepts: Handling large datasets, analyzing trends in online posts.
-
- This folder contains a data pipeline built for ingesting and analyzing Hacker News posts.
- Skills applied: SQL, Data Pipelines, ETL, Python, Data Extraction and Transformation.
- Key concepts: Automating the collection of data, building efficient data pipelines for continuous updates.
-
- This project focuses on detecting outliers in healthcare datasets.
- Skills applied: Python (Pandas, NumPy, Seaborn, Matplotlib, Scipy, sklearn), Outlier Detection Techniques
- Key concepts: Identifying anomalies, handling skewed data distributions, applying machine learning to identified outliers.
-
- An analysis of student performance based on various factors.
- Skills applied: EDA, Data Visualization, Statistical Analysis
- Key concepts: Creating visual reports to track performance trends, insights into education data.
-
- Performed comprehensive data analysis on Northwind Traders, an international gourmet food distributor, examining sales patterns, customer behavior, and product performance.
- Skills applied: SQL, Data Manipulation (CTEs, WINDOW functions), EDA, Data Visualization
- Key concepts: Analysis of sales metrics, customer behavior, and product performance to optimize business operations and identify growth opportunities in a gourmet food distribution company.
This repository is designed to apply and improve skills in various tools and languages essential for data analytics, including but not limited to:
- Exploratory Data Analysis (EDA)
- Python (Pandas, NumPy, Matplotlib, Plotly, Seaborn)
- SQL (for querying and transforming data)
- Data Pipelines (Building automated workflows)
- Data Visualization (using Matplotlib, Seaborn, Power BI)
- Excel (for quick data manipulation and visualization)
- Outlier Detection and handling of messy, real-world data.
Credits to the course instructors (if any), and any resources used.
This project is licensed under the MIT License - see the LICENSE.md file for details
โจ Feel free to explore this repository. Lifelong Learner!