Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to convert checkpoint files that adapt to different distributed world sizes #246

Open
swjtulinxi opened this issue Aug 27, 2024 · 1 comment

Comments

@swjtulinxi
Copy link

hi, i have tried your example to convert the swin_moe_small_patch4_window12_192_16expert_32gpu_22k。the first problem is the example format does not match the filesofswin_moe_small_patch4_window12_192_16expert_32gpu_22k, therefore,i have modified some code, however the example can only convert one rank.pth,not all rank.pth to one, can you show the correct example, I am puzzled by this question, thanks。

@ghostplant
Copy link
Contributor

Hi, the pre-trained SwinMoE parameters were based on an early Tutel version (maybe 0.1 or 0.2) while there is parameter renaming within MoE-layer since 0.3, so the old parameters from Swin should be synced to latest version to avoid parameter mapping incorrectly. Can you explain more details about which parameter link you found is problematic, and how can you reproduce it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants