-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The baseline of ResNet18 on CIFAR100 is relatively lower #20
Comments
Hi, Q. "In fact, based on the pytorch-cifar100, without any extra augmentations, the top1 accuracy can achieve up to 78.05% in my previous experiments." Q. "And I have conducted an experiment using the distillation, which improves the baseline from 77.96% to 78.45%." By the way, we don't use extra augmentations for our method, it still a fair comparison that we also don't use extra augmentations in baseline (original KD or LSR). |
Hi, here is my training log, and you can reproduce the result using the repo. , which achieves ~78.05% top-1 accuracy without extra augmentations. I think the distillation does work yet not conspicuous, and it could only improve about 0.5% in my setting. |
Hi, your implementation is different with the original pytorch-cifar100, the original pytorch-cifar100 can not achieve ~78.05% top1 accuracy. |
Hi, I would first appreciate your work for interpreting the relationship between the KD and LSR. However, the baseline of ResNet18 on cifar100 is much lower than the implementation pytorch-cifar100, which may be caused by the modified ResNet. In fact, based on the pytorch-cifar100, without any extra augmentations, the top1 accuracy can achieve up to 78.05% in my previous experiments. So I would cast my doubt on the performance gain of the self distillation. And I have conducted an experiment using the distillation, which improves the baseline from 77.96% to 78.45%. It does improve the performance yet not conspicuous as the paper claimed.
The text was updated successfully, but these errors were encountered: