Warning
The FlexFlow repository has been split into separate flexflow-train and flexflow-serve repositories. You are currently viewing flexflow-train. For anything inference/serving-related, go to flexflow-serve.
FlexFlow is a deep learning framework that accelerates distributed DNN training by automatically searching for efficient parallelization strategies.
Please let us know if you encounter any bugs or have any suggestions by submitting an issue.
We welcome all contributions to FlexFlow from bug fixes to new features and extensions.
-
Colin Unger, Zhihao Jia, Wei Wu, Sina Lin, Mandeep Baines, Carlos Efrain Quintero Narvaez, Vinay Ramakrishnaiah, Nirmal Prajapati, Pat McCormick, Jamaludin Mohd-Yusof, Xi Luo, Dheevatsa Mudigere, Jongsoo Park, Misha Smelyanskiy, and Alex Aiken. Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. In Proceedings of the Symposium on Operating Systems Design and Implementation (OSDI), July 2022.
-
Zhihao Jia, Matei Zaharia, and Alex Aiken. Beyond Data and Model Parallelism for Deep Neural Networks. In Proceedings of the 2nd Conference on Machine Learning and Systems (MLSys), April 2019.
-
Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), July 2018.
FlexFlow is developed and maintained by teams at CMU, Facebook, Los Alamos National Lab, MIT, and Stanford (alphabetically).
FlexFlow uses Apache License 2.0.