Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

微调训练问题 #13

Open
mhzn-yn opened this issue Sep 30, 2024 · 1 comment
Open

微调训练问题 #13

mhzn-yn opened this issue Sep 30, 2024 · 1 comment

Comments

@mhzn-yn
Copy link

mhzn-yn commented Sep 30, 2024

1、作为微调来说,是用基础模型重新训练,还是使用微调的方式。
2、如果需要重新训练一个7b模型需要多少显存,说明中没有找到相关硬件需求表格。
3、对于长文本输入的情况下,更加适用于那种方式。

@bys0318
Copy link
Member

bys0318 commented Oct 27, 2024

  1. 建议在已经经过长度扩展的base模型上做Long Context Alignment微调(SFT, DPO)
  2. 显存占用取决于序列长度,比如我们论文中64k长度开zero3训练需要80G显存
  3. 如果你的base模型已经在更长的序列上加训过(长度扩展)则只微调即可,否则需要先做加训

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants