Remove CUDA synchronizations by slicing input tensor with int
instead of CUDA tensors in nn.LinearEmbeddingEncoder
#2612
Job | Run time |
---|---|
1m 58s | |
1m 58s |