Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP8 AllGather Support in Fairscale #1185

Open
wants to merge 21 commits into
base: ngoyal_changes_for_pp_fp8_jiecaoyu_debug
Choose a base branch
from

Commits on Mar 29, 2024

  1. added option for no PG validation for faster init (#1161)

    Co-authored-by: Naman Goyal <naman@fb.com>
    2 people authored and levendlee committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    73ce4b4 View commit details
    Browse the repository at this point in the history
  2. Mirros Jiecao's change.

    levendlee committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    33457b3 View commit details
    Browse the repository at this point in the history
  3. Debug non-determinism issues.

    This commit works with a 4 GPU run on SMALL model with FSDP and PP
    enabled.
    levendlee committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    70f5ff5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    16c682d View commit details
    Browse the repository at this point in the history

Commits on Apr 1, 2024

  1. Cleans up code.

    levendlee committed Apr 1, 2024
    Configuration menu
    Copy the full SHA
    24a769f View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2024

  1. Fix main_grad attribute checking.

    - Clean up flatten and non_flatten parameter generation logic.
    - Avoid checking `main_grad` attribute all equal to zeros.
    levendlee committed Apr 2, 2024
    Configuration menu
    Copy the full SHA
    3e2e77f View commit details
    Browse the repository at this point in the history

Commits on Apr 9, 2024

  1. Fix no pp hanging error.

    - Cleans up amax and scale update logic. Amax and scale should be
      done for both weights and parameters. So it should be done at
      forward of each microbatch.
    
    - Consolidate `cast_params` and `all_gather` stream.
    levendlee committed Apr 9, 2024
    Configuration menu
    Copy the full SHA
    1be7aa0 View commit details
    Browse the repository at this point in the history

Commits on Apr 10, 2024

  1. Configuration menu
    Copy the full SHA
    57eb557 View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2024

  1. Configuration menu
    Copy the full SHA
    e9e8f8e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    21f8e05 View commit details
    Browse the repository at this point in the history

Commits on May 20, 2024

  1. added option for no PG validation for faster init (#1161)

    Co-authored-by: Naman Goyal <naman@fb.com>
    2 people authored and levendlee committed May 20, 2024
    Configuration menu
    Copy the full SHA
    8ec7c1d View commit details
    Browse the repository at this point in the history
  2. Mirros Jiecao's change.

    levendlee committed May 20, 2024
    Configuration menu
    Copy the full SHA
    f27ab17 View commit details
    Browse the repository at this point in the history
  3. Debug non-determinism issues.

    This commit works with a 4 GPU run on SMALL model with FSDP and PP
    enabled.
    levendlee committed May 20, 2024
    Configuration menu
    Copy the full SHA
    fa9cf77 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6fa19e0 View commit details
    Browse the repository at this point in the history
  5. Cleans up code.

    levendlee committed May 20, 2024
    Configuration menu
    Copy the full SHA
    80ffd54 View commit details
    Browse the repository at this point in the history
  6. Fix main_grad attribute checking.

    - Clean up flatten and non_flatten parameter generation logic.
    - Avoid checking `main_grad` attribute all equal to zeros.
    levendlee committed May 20, 2024
    Configuration menu
    Copy the full SHA
    afb2ca1 View commit details
    Browse the repository at this point in the history
  7. Fix no pp hanging error.

    - Cleans up amax and scale update logic. Amax and scale should be
      done for both weights and parameters. So it should be done at
      forward of each microbatch.
    
    - Consolidate `cast_params` and `all_gather` stream.
    levendlee committed May 20, 2024
    Configuration menu
    Copy the full SHA
    25b2322 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    0d1502b View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    5edb109 View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    2df199f View commit details
    Browse the repository at this point in the history
  11. Merge branch 'shikaili_fp8_allgather_no_pp_fix' of github.com:faceboo…

    …kresearch/fairscale into shikaili_fp8_allgather_no_pp_fix
    levendlee committed May 20, 2024
    Configuration menu
    Copy the full SHA
    da36e31 View commit details
    Browse the repository at this point in the history