请问以下如果想计算llama-2-7b在半精度下的显著性得分大概需要多少显存呢？ #11

zhiyunjiang · 2023-12-27T08:25:53Z

如题

leanwang326 · 2023-12-27T12:05:14Z

大概14GB多一点吧，不过你可能需要更新下代码/自己在attention_attr.py里加一句
for p in model.parameters():
p.requires_grad = False
之前忘了加了，求了整个模型的参数的梯度，加上这句以后显存占用主要就是模型的参数，大概14GB吧

zhiyunjiang · 2023-12-27T14:31:23Z

谢谢，还有一点疑惑。加了这语句后，后续的loss.backward()是不会计算模型参数的梯度吗，还是说会计算但是被动态释放掉了。

leanwang326 · 2023-12-27T14:35:06Z

参数的不会计算吧，中间激活值的应该计算然后释放掉了。呃因为我实际上的实现是在attention_weight上乘了一个z = torch.ones_like(attention_weight)，然后z.requires_grad是true，通过求z的梯度得到的显著性值

noobimp · 2023-12-30T15:44:32Z

请问下式(1)注意力矩阵A为什么有个转置呢🤔

leanwang326 · 2023-12-30T16:34:44Z

转置那个是typo,不好意思。具体可以参考https://github.com/lancopku/label-words-are-anchors/issues/7，arxiv上我也更正了

noobimp · 2023-12-30T17:04:28Z

转置那个是typo,不好意思。具体可以参考https://github.com/lancopku/label-words-are-anchors/issues/7，arxiv上我也更正了

感谢感谢

ganchengguang · 2024-02-01T15:14:52Z

请问下如果想计算llama2模型的attentioner_for_attribution.py。gpt2_attn和GPT2AttentionerManager该怎么改呢？看了一下hf源码中的modeling_llama不像gpt2有def _attn这个函数。不知道该怎么套用了，求大佬能否给一点思路。

leanwang326 · 2024-02-03T11:36:55Z

请问下如果想计算llama2模型的attentioner_for_attribution.py。gpt2_attn和GPT2AttentionerManager该怎么改呢？看了一下hf源码中的modeling_llama不像gpt2有def _attn这个函数。不知道该怎么套用了，求大佬能否给一点思路。

我后来想了想之前写的写麻烦了，可以在attention_prob那里加一个hook，获取它的值和梯度，来计算saliency (我们代码里是在attention_prob上乘了z = torch.ones_like(attention_prob)的，然后取了z的梯度，这相当于attention_prob*attention_prob.grad)
另外，得在前向的时候顺便设置一下attention_prob.requires_grad = True，以免grad_fn没注册

MidiyaZhu · 2024-02-06T02:51:26Z

还有一个问题是

for p in model.parameters():
     p.requires_grad = False

这个代码添加后，在llama2-7b-chat-hf里后续
for idx, data in tqdm(enumerate(analysis_dataloader)): data = dict_to(data, model.device) print(data['input_ids'].shape) attentionermanger.zero_grad() output = model(**data, requires_grad=True) label = data['labels'] loss = F.cross_entropy(output['logits'], label) loss.backward()
grad_fn在output, loss里均为None导致loss.backward()无法继续了。但是在gpt2-xl里可以继续。原因应该是gpt2-xl在执行attentionAdapter的时候进入了_forward重新设置了self.params = torch.ones_like(attn_weights, requires_grad=True)，但是llama2因为没有对应函数进入不了gpt2_attn去激活。请问这里有相关修改建议吗？多谢。

ganchengguang · 2024-02-06T06:44:11Z

还有一个问题是
for p in model.parameters():
     p.requires_grad = False
这个代码添加后，在llama2-7b-chat-hf里后续 for idx, data in tqdm(enumerate(analysis_dataloader)): data = dict_to(data, model.device) print(data['input_ids'].shape) attentionermanger.zero_grad() output = model(**data, requires_grad=True) label = data['labels'] loss = F.cross_entropy(output['logits'], label) loss.backward() grad_fn在output, loss里均为None导致loss.backward()无法继续了。但是在gpt2-xl里可以继续。原因应该是gpt2-xl在执行attentionAdapter的时候进入了_forward重新设置了self.params = torch.ones_like(attn_weights, requires_grad=True)，但是llama2因为没有对应函数进入不了gpt2_attn去激活。请问这里有相关修改建议吗？多谢。

同问，我也是卡在算loss的这里了，F.crossentropy报错。而且有些数据不会报错，有些会。不知道为什么

leanwang326 · 2024-02-06T08:44:46Z

我添加了两个文件activation_analysis.py和numpy_writer.py，能够支持存储计算过程中的中间结果和中间梯度结果（不同step的结果会append到一个列表里），用例见demo_grad.py

leanwang326 · 2024-02-06T08:46:13Z

grad_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问以下如果想计算llama-2-7b在半精度下的显著性得分大概需要多少显存呢？ #11

请问以下如果想计算llama-2-7b在半精度下的显著性得分大概需要多少显存呢？ #11

zhiyunjiang commented Dec 27, 2023

leanwang326 commented Dec 27, 2023

zhiyunjiang commented Dec 27, 2023

leanwang326 commented Dec 27, 2023

noobimp commented Dec 30, 2023 •

edited

Loading

leanwang326 commented Dec 30, 2023

noobimp commented Dec 30, 2023

ganchengguang commented Feb 1, 2024

leanwang326 commented Feb 3, 2024

MidiyaZhu commented Feb 6, 2024

ganchengguang commented Feb 6, 2024

leanwang326 commented Feb 6, 2024

leanwang326 commented Feb 6, 2024

请问以下如果想计算llama-2-7b在半精度下的显著性得分大概需要多少显存呢？ #11

请问以下如果想计算llama-2-7b在半精度下的显著性得分大概需要多少显存呢？ #11

Comments

zhiyunjiang commented Dec 27, 2023

leanwang326 commented Dec 27, 2023

zhiyunjiang commented Dec 27, 2023

leanwang326 commented Dec 27, 2023

noobimp commented Dec 30, 2023 • edited Loading

leanwang326 commented Dec 30, 2023

noobimp commented Dec 30, 2023

ganchengguang commented Feb 1, 2024

leanwang326 commented Feb 3, 2024

MidiyaZhu commented Feb 6, 2024

ganchengguang commented Feb 6, 2024

leanwang326 commented Feb 6, 2024

leanwang326 commented Feb 6, 2024

noobimp commented Dec 30, 2023 •

edited

Loading