Skip to content

Latest commit

 

History

History
23 lines (18 loc) · 1.2 KB

index.md

File metadata and controls

23 lines (18 loc) · 1.2 KB
layout title order
page
1

MALT-2 is a distributed data-parallel machine learning system for Torch and MiLDE.

MALT-2 is a ML parallelization framework to paralleize any existing ML application. The system is designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art accuracy.

  • Easy to add to existing code general-purpose interface, requires only changing optimization type to dstsgd (distributed SGD).
  • Support for multi-machine, multi-GPU training with CUDA implementations for distributed parameter averaging.
  • Includes C++ and Lua interface to extend existing code. Support for Torch and NEC MiLDE (not open-sourced). Support for pytorch and darknet coming soon.
  • Easily extend your existing Torch code with minimal changes.
  • Explore existing distributed GPU apps over Resnets, and large language models.
  • Various optimizations such as sparse-reduce, NOTIFY_ACK to accelerate distributed model training.
  • Contributions Welcome!