Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch] Reduce the CPU overheads of GroupedLinear #1072

Merged
merged 10 commits into from
Aug 9, 2024

Commits on Aug 8, 2024

  1. use fused_multi_cast_transpose

    Signed-off-by: Xin Yao <xiny@nvidia.com>
    yaox12 committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    3457e3c View commit details
    Browse the repository at this point in the history
  2. fix input being empty tensor

    Signed-off-by: Xin Yao <xiny@nvidia.com>
    yaox12 committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    074563b View commit details
    Browse the repository at this point in the history
  3. [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci
    pre-commit-ci[bot] authored and yaox12 committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    324815d View commit details
    Browse the repository at this point in the history
  4. allocate output tensors in C++

    Signed-off-by: Xin Yao <xiny@nvidia.com>
    yaox12 committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    4e57d88 View commit details
    Browse the repository at this point in the history
  5. simplify code

    Signed-off-by: Xin Yao <xiny@nvidia.com>
    yaox12 committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    ef31897 View commit details
    Browse the repository at this point in the history
  6. avoid cudaGetDriverEntryPoint

    Signed-off-by: Xin Yao <xiny@nvidia.com>
    yaox12 committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    26dc2a3 View commit details
    Browse the repository at this point in the history
  7. reduce torch.Tensor() calls

    Signed-off-by: Xin Yao <xiny@nvidia.com>
    yaox12 committed Aug 8, 2024
    Configuration menu
    Copy the full SHA
    3a9d2f3 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    63b55dd View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    2d4c80b View commit details
    Browse the repository at this point in the history

Commits on Aug 9, 2024

  1. update test

    Signed-off-by: Xin Yao <xiny@nvidia.com>
    yaox12 committed Aug 9, 2024
    Configuration menu
    Copy the full SHA
    1bb41a1 View commit details
    Browse the repository at this point in the history