Meta-Iterative Map-Reduce to perform Regression massively parallely on a cluster with MPI and CUDA for GPU and CPU-nodes support.
-
Updated
Jul 3, 2018 - Cuda
Meta-Iterative Map-Reduce to perform Regression massively parallely on a cluster with MPI and CUDA for GPU and CPU-nodes support.
Well commented code for different types of training configurations
Project showcasing how to get started with Distributed XGBoost using PySpark in CML.
A simple, easy-to-understand library for diffusion models using Flax and Jax. Includes detailed notebooks on DDPM, DDIM, and EDM with simplified mathematical explanations. Made as part of my journey for learning and experimenting with state-of-the-art generative AI.
This project contains scripts/modules for distributed training
📜 A python library for distributed training of a Transformer neural network across the Internet to solve the Running Key Cipher, widely known in the field of Cryptography.
Development of Project HPGO | Hybrid Parallelism Global Orchestration
基于kubernetes/client-go API, 进行分布式训练GPU资源生命周期控制并支持多用户多任务训练日志实时通过websocket的连续重定向
A GitHub repository showcasing the implementation of AI scaling techniques and integration with MLflow for streamlined experiment tracking and management in machine learning workflows.
S-BDT: Distributed Differentially Private Boosted Decision Trees
Everything is born from a simple experiment.
Messing with Distributed TensorFlow and Kubernetes
In this project, I implement and compare the different distributed training techniques from data parallelization and model parallelization from scratch using PyTorch
This repository shows how to distribute training of large machine learning models to make it faster.
Adaptive Tensor Parallelism for Foundation Models
Short course: Introduction to Machine Learning
Experiments with low level communication patterns that are useful for distributed training.
Tools for ML/MXNet on Kubernetes. Rework of original tf-operator to support MXNet framework.
Machine Translation Model Training Distributed with python and jax.
Add a description, image, and links to the distributed-training topic page so that developers can more easily learn about it.
To associate your repository with the distributed-training topic, visit your repo's landing page and select "manage topics."