Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cdist op #9391

Open
wants to merge 34 commits into
base: master
Choose a base branch
from
Open

Add cdist op #9391

wants to merge 34 commits into from

Conversation

marigoold
Copy link
Contributor

@marigoold marigoold commented Nov 8, 2022

cdist 对于两个输入 x1 (shape=[B, R1, C]),x2 (shape=[B, R2, C]),计算每个 batch 内 x1 和 x2 每一行向量之间距离的p范数,得到结果 result (shape=[B, R1, R2])。

torch 文档见 https://pytorch.org/docs/stable/generated/torch.cdist.html

@marigoold marigoold marked this pull request as ready for review November 15, 2022 06:43
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@marigoold marigoold requested review from oneflow-ci-bot and removed request for oneflow-ci-bot November 18, 2022 09:43
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 









❌ OneFlow resnet50 time: 152.2ms (= 15221.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 171.8ms (= 17176.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 171.8ms / 152.2ms)

OneFlow resnet50 time: 96.7ms (= 9672.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.1ms (= 11211.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 112.1ms / 96.7ms)

OneFlow resnet50 time: 68.8ms (= 13767.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.7ms (= 17549.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 87.7ms / 68.8ms)

OneFlow resnet50 time: 60.1ms (= 12026.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.7ms (= 14932.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 74.7ms / 60.1ms)

OneFlow resnet50 time: 54.9ms (= 10988.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.3ms (= 13863.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 69.3ms / 54.9ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9391/

@github-actions
Copy link
Contributor

Speed stats:

@github-actions
Copy link
Contributor

Speed stats:

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: NVIDIA GeForce GTX 1080 









❌ OneFlow resnet50 time: 153.9ms (= 15389.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 172.1ms (= 17205.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 172.1ms / 153.9ms)

OneFlow resnet50 time: 96.7ms (= 9669.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.7ms (= 11267.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 112.7ms / 96.7ms)

OneFlow resnet50 time: 69.1ms (= 13814.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 90.4ms (= 18085.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 90.4ms / 69.1ms)

OneFlow resnet50 time: 60.9ms (= 12173.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.4ms (= 14876.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 74.4ms / 60.9ms)

OneFlow resnet50 time: 55.1ms (= 11027.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.7ms (= 14540.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 72.7ms / 55.1ms)

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 141.7ms (= 14168.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.2ms (= 16320.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 163.2ms / 141.7ms)

OneFlow resnet50 time: 86.0ms (= 8600.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10228.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 102.3ms / 86.0ms)

OneFlow resnet50 time: 57.8ms (= 11553.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15546.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.7ms / 57.8ms)

OneFlow resnet50 time: 45.7ms (= 9148.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.8ms (= 13953.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 69.8ms / 45.7ms)

OneFlow resnet50 time: 40.0ms (= 8008.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.6ms (= 14121.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 70.6ms / 40.0ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9391/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.1ms (= 14007.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.8ms (= 16280.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.8ms / 140.1ms)

OneFlow resnet50 time: 85.4ms (= 8542.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.3ms (= 10134.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.3ms / 85.4ms)

OneFlow resnet50 time: 57.9ms (= 11576.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 87.8ms (= 17563.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.52 (= 87.8ms / 57.9ms)

OneFlow resnet50 time: 44.4ms (= 8875.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.1ms (= 14217.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.60 (= 71.1ms / 44.4ms)

OneFlow resnet50 time: 39.5ms (= 7900.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13573.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.72 (= 67.9ms / 39.5ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9391/

@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.2ms (= 14121.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.9ms (= 14286.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.01 (= 142.9ms / 141.2ms)

OneFlow resnet50 time: 81.4ms (= 8144.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.5ms (= 8652.3ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.06 (= 86.5ms / 81.4ms)

OneFlow resnet50 time: 51.0ms (= 10201.0ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 62.2ms (= 12442.2ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.22 (= 62.2ms / 51.0ms)

OneFlow resnet50 time: 33.6ms (= 6727.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 45.5ms (= 9104.1ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.35 (= 45.5ms / 33.6ms)

OneFlow resnet50 time: 26.5ms (= 5299.3ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.8ms (= 8354.9ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.58 (= 41.8ms / 26.5ms)

OneFlow swin dataloader time: 0.245s (= 49.045s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.112s / 200, num_workers=1)
Relative speed: 0.614 (= 0.151s / 0.245s)

OneFlow swin dataloader time: 0.067s (= 13.400s / 200, num_workers=4)
PyTorch swin dataloader time: 0.039s (= 7.885s / 200, num_workers=4)
Relative speed: 0.588 (= 0.039s / 0.067s)

OneFlow swin dataloader time: 0.040s (= 7.985s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.466s / 200, num_workers=8)
Relative speed: 0.559 (= 0.022s / 0.040s)

❌ OneFlow resnet50 time: 152.8ms (= 15280.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 166.9ms (= 16694.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.09 (= 166.9ms / 152.8ms)

OneFlow resnet50 time: 92.3ms (= 9228.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.9ms (= 10390.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 103.9ms / 92.3ms)

OneFlow resnet50 time: 60.2ms (= 12033.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.5ms (= 17696.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.47 (= 88.5ms / 60.2ms)

OneFlow resnet50 time: 42.2ms (= 8442.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14207.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 71.0ms / 42.2ms)

OneFlow resnet50 time: 37.4ms (= 7484.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.8ms (= 14767.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.97 (= 73.8ms / 37.4ms)

@github-actions
Copy link
Contributor

Speed stats:

Comment on lines +3468 to +3474
// mm_for_euclid_dist has accuracy issue
// if (p == 2 && (mode == 1 || (mode == 0 && (r1 > 25 || r2 > 25)))) {
// shape output_shape(max_batch_shape);
// output_shape.emplace_back(r1);
// output_shape.emplace_back(r2);
// return JUST(Reshape(JUST(euclidean_dist(x1_expand, x2_expand)), output_shape));
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除无用的注释

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除无用的注释

这里的代码在 torch 里面是有的,只是当前还有精度问题,解决掉之后就解除注释了

Comment on lines +3418 to +3429
Maybe<Tensor> euclidean_dist(const std::shared_ptr<Tensor>& x1,
const std::shared_ptr<Tensor>& x2) const {
const auto& x1_norm = JUST(ReduceSum(JUST(ScalarPow(x1, 2, false)), {-1}, true));
const auto& x2_norm = JUST(ReduceSum(JUST(ScalarPow(x2, 2, false)), {-1}, true));
const auto& x1_ones = JUST(OnesLike(x1_norm));
const auto& x2_ones = JUST(OnesLike(x2_norm));
const auto& x1_cat = JUST(Concat({JUST(ScalarMul(x1, -2, false)), x1_norm, x1_ones}, -1));
const auto& x2_cat = JUST(Concat({x2, x2_ones, x2_norm}, -1));
const auto& result =
JUST(MatMul(x1_cat, JUST(Transpose2dim(x2_cat, -1, -2)), false, false, 1.0));
return Sqrt(JUST(ClampMin(result, 0.0)));
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数的调用在下面注释了,没有用到了

x2 = x2.to_global(placement=placement, sbp=sbp)
z = torch.cdist(x1, x2)
return z

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看这里测试都是相同dim的,代码里写了broadcast的逻辑,也测试一下broadcast的情况吧

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants