DataForge OpenAI Hub is dedicated to building practical, real-world solutions leveraging the power of machine learning, MLOps, and data analytics pipelines. Explore our two main projects below!
This repository implements an end-to-end ETL pipeline for analyzing Steam sales data. It retrieves, processes, and stores gaming metadata and sales data, offering insightful trends and performance analysis.
- ETL Pipeline: Automates the retrieval, validation, and processing of data from SteamSpy and Steam APIs.
- Data Storage: Stored in a MySQL database hosted on Aiven Cloud.
- Visualization: Interactive Tableau dashboards provide insights into gaming trends.
- Automation: Fully automated using CLI-based orchestration.
- Open-source contribution: Deployed and maintained a PyPI package/library (open-source contribution).
📂 Technologies Used: Python, MySQL, Tableau, Steam API, Typer CLI, ETL
2. End-to-End ML Project: Credit Card Fraud Detection [mlops-credit-card-fraud-detection-end-to-end] 🛡️
This repository presents a comprehensive machine learning project that tackles credit card fraud detection using MLOps best practices. Highlights include:
- Ensemble Models: Combines multiple algorithms for better accuracy in detecting fraudulent transactions.
- Data & Model Versioning: Managed with DVC to ensure consistent and reproducible results.
- CI/CD Pipelines: Implemented with GitHub Actions to automate the workflow.
- Deployment: Model deployment to production with robust monitoring.
📂 Technologies Used: Python, Jupyter, Scikit-learn, DVC, GitHub Actions, Docker
Clone the respective repository and follow the setup instructions provided in each project.
# Clone the Data Pipeline project
git clone https://github.com/DataForgeOpenAIHub/Steam-Sales-Analysis.git
# Clone the ML project
git clone https://github.com/DataForgeOpenAIHub/mlops-credit-card-fraud-detection-end-to-end.git