-
Notifications
You must be signed in to change notification settings - Fork 53
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Operator] Enhancements to Reduce (#366)
In some input shapes, the current reduce schedule will underutilize the GPU. E.g., `reduce [1, 128, 128, 3] , dims=[1, 2]` will spawn 1 threadblock with 3 threads that each iterate over 128*128 elements. This PR made two changes to optimize these cases: 1. Add resolve_decompose in the resolve logic of Reduce. This will force launch separate kernels for each reduce dimension, increasing concurrency. 2. In the default reduce schedule template, spawn multiple warps within the reduce dimensions, which then will communicate via shared memory or use atomics to perform the reduce. Also added a resolve rule for AdaptivePoolChannelLast.
- Loading branch information
Showing
5 changed files
with
179 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters