New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[PyTorch] Reduce the CPU overheads of `GroupedLinear` #1072

Merged

timmoon10 merged 10 commits into NVIDIA:main from yaox12:xiny/fused_multi_cast_transpose

Aug 9, 2024

Commits on Aug 8, 2024

use fused_multi_cast_transpose
```
Signed-off-by: Xin Yao <xiny@nvidia.com>
```
yaox12 committed Aug 8, 2024
Configuration menu
View commit details

Copy full SHA for 3457e3c

Browse repository at this point
Copy the full SHA

3457e3c View commit details

Browse the repository at this point in the history
fix input being empty tensor
```
Signed-off-by: Xin Yao <xiny@nvidia.com>
```
yaox12 committed Aug 8, 2024
Configuration menu
View commit details

Copy full SHA for 074563b

Browse repository at this point
Copy the full SHA

074563b View commit details

Browse the repository at this point in the history
[pre-commit.ci] auto fixes from pre-commit.com hooks
```
for more information, see https://pre-commit.ci
```
pre-commit-ci[bot] authored and yaox12 committed Aug 8, 2024
Configuration menu
View commit details

Copy full SHA for 324815d

Browse repository at this point
Copy the full SHA

324815d View commit details

Browse the repository at this point in the history
allocate output tensors in C++
```
Signed-off-by: Xin Yao <xiny@nvidia.com>
```
yaox12 committed Aug 8, 2024
Configuration menu
View commit details

Copy full SHA for 4e57d88

Browse repository at this point
Copy the full SHA

4e57d88 View commit details

Browse the repository at this point in the history
simplify code
```
Signed-off-by: Xin Yao <xiny@nvidia.com>
```
yaox12 committed Aug 8, 2024
Configuration menu
View commit details

Copy full SHA for ef31897

Browse repository at this point
Copy the full SHA

ef31897 View commit details

Browse the repository at this point in the history
avoid cudaGetDriverEntryPoint
```
Signed-off-by: Xin Yao <xiny@nvidia.com>
```
yaox12 committed Aug 8, 2024
Configuration menu
View commit details

Copy full SHA for 26dc2a3

Browse repository at this point
Copy the full SHA

26dc2a3 View commit details

Browse the repository at this point in the history
reduce torch.Tensor() calls
```
Signed-off-by: Xin Yao <xiny@nvidia.com>
```
yaox12 committed Aug 8, 2024
Configuration menu
View commit details

Copy full SHA for 3a9d2f3

Browse repository at this point
Copy the full SHA

3a9d2f3 View commit details

Browse the repository at this point in the history
[pre-commit.ci] auto fixes from pre-commit.com hooks
```
for more information, see https://pre-commit.ci
```
pre-commit-ci[bot] committed Aug 8, 2024
Configuration menu
View commit details

Copy full SHA for 63b55dd

Browse repository at this point
Copy the full SHA

63b55dd View commit details

Browse the repository at this point in the history
Merge branch 'main' into xiny/fused_multi_cast_transpose

timmoon10 authored Aug 8, 2024
Configuration menu
View commit details

Copy full SHA for 2d4c80b

Browse repository at this point
Copy the full SHA

2d4c80b View commit details

Browse the repository at this point in the history

Commits on Aug 9, 2024

update test
```
Signed-off-by: Xin Yao <xiny@nvidia.com>
```
yaox12 committed Aug 9, 2024
Configuration menu
View commit details

Copy full SHA for 1bb41a1

Browse repository at this point
Copy the full SHA

1bb41a1 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] Reduce the CPU overheads of `GroupedLinear` #1072

[PyTorch] Reduce the CPU overheads of `GroupedLinear` #1072

Commits on Aug 8, 2024

Commits on Aug 9, 2024

[PyTorch] Reduce the CPU overheads of GroupedLinear #1072

[PyTorch] Reduce the CPU overheads of GroupedLinear #1072

Commits on Aug 8, 2024

Commits on Aug 9, 2024

[PyTorch] Reduce the CPU overheads of `GroupedLinear` #1072

[PyTorch] Reduce the CPU overheads of `GroupedLinear` #1072