vlfuse_helper中的attention_mask_l用法 #11803

HandsLing · 2024-06-20T01:47:58Z

请教一个问题，在vlfuse_helper.py中的BiMultiHeadAttention类中的forward函数中，对于attention_mask_l的用法为什么和https://github.com/IDEA-Research/GroundingDINO这里面的不一致呢？
“”“
if attention_mask_l is not None:
assert (attention_mask_l.dim() == 2)
attention_mask = attention_mask_l.unsqueeze(1).unsqueeze(1)
attention_mask = attention_mask.expand(bsz, 1, tgt_len, src_len)
attention_mask = attention_mask.masked_fill(attention_mask==0, -9e15)
if attention_mask.size() != (bsz, 1, tgt_len, src_len):
raise ValueError('Attention mask should be of size {(bsz, 1, tgt_len, src_len)}')
attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + attention_mask
attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)
”“”
这里面的代码是这样，我在测试的时候发现attention_mask_l其实是一个全False的tensor，那这个操作“attention_mask = attention_mask.masked_fill(attention_mask==0, -9e15)” 相当于把attention_mask变成全True的tensor，再加到attn_weights上面去了，这相当于attn_weights每个位置加了1，请问这是你们最初想要实现的效果吗？为什么要这么做呢？

talebolano · 2024-06-20T08:34:09Z

same question

talebolano · 2024-06-20T13:18:07Z

@HandsLing 我尝试了将attention_mask_l直接置为None，对最终输出结果没有任何影响，我怀疑一开始这里便写错了

HandsLing · 2024-06-21T00:24:41Z

@talebolano 我在推理的时候，直接在attn_weights上面加上1，最后效果也是一样

mm-assistant bot assigned RangiLyu Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlfuse_helper中的attention_mask_l用法 #11803

vlfuse_helper中的attention_mask_l用法 #11803

HandsLing commented Jun 20, 2024

talebolano commented Jun 20, 2024

talebolano commented Jun 20, 2024

HandsLing commented Jun 21, 2024

vlfuse_helper中的attention_mask_l用法 #11803

vlfuse_helper中的attention_mask_l用法 #11803

Comments

HandsLing commented Jun 20, 2024

talebolano commented Jun 20, 2024

talebolano commented Jun 20, 2024

HandsLing commented Jun 21, 2024