Fix coverage penalty (wu) #119

l-k-11235 · 2024-09-25T09:25:09Z

When I use

beam_size: 2
coverage_penalty: 'wu'
beta: 1

In my decoding config, I have the following error

Traceback (most recent call last):
  File "/usr/local/bin/eole", line 33, in <module>
    sys.exit(load_entry_point('EOLE', 'console_scripts', 'eole')())
  File "/workdir/eole/eole/bin/main.py", line 39, in main
    bin_cls.run(args)
  File "/workdir/eole/eole/bin/run/predict.py", line 42, in run
    predict(config)
  File "/workdir/eole/eole/bin/run/predict.py", line 18, in predict
    _, _, _ = engine.infer_file()
  File "/workdir/eole/eole/inference_engine.py", line 38, in infer_file
    scores, estims, preds = self._predict(infer_iter)
  File "/workdir/eole/eole/inference_engine.py", line 170, in _predict
    scores, estims, preds = self.predictor._predict(
  File "/workdir/eole/eole/predict/inference.py", line 475, in _predict
    batch_data = self.predict_batch(batch, attn_debug)
  File "/workdir/eole/eole/predict/generator.py", line 71, in predict_batch
    return self._predict_batch_with_strategy(batch, decode_strategy)
  File "/workdir/eole/eole/predict/generator.py", line 149, in _predict_batch_with_strategy
    decode_strategy.advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 437, in advance
    super(BeamSearchLM, self).advance(log_probs, attn)
  File "/workdir/eole/eole/predict/beam_search.py", line 383, in advance
    self.topk_scores -= cov_penalty.view(_B, self.beam_size).float()
RuntimeError: shape '[1, 2]' is invalid for input of size 518

I have tried to fix it but it revamps a bit the penalty calculation.
I calculate it "from scratch" at each decoding step, using the attentions.

l-k-11235 · 2024-09-25T15:32:55Z

It seems that the return_attn path is also broken. So I've made sure I don't go through this path when applying penalty coverage.

l-k-11235

I leave these few comments to try and explain the changes I have made

l-k-11235 · 2024-09-27T07:48:21Z

eole/predict/beam_search.py

                if self._cov_pen:  # coverage penalty
                    self._prev_penalty = torch.zeros_like(self.topk_log_probs)
-                    self._coverage = current_attn
-            else:
+                    self._coverage = torch.zeros(


self_coverage is constructed sequentially by concatenation as a tensor of size (beam_size x current_batch_size, T+ 1, N) where T is the number of decoding steps and N is the length of the source. In this way, for a given decoding step t, the slice self._coverage[k, t+1, :] represents the vector of the attentions granted by the target t-th token to the source tokens.
In the first decoding step, the coverage is initialized with a vector of zeros. When calculating the penalty, some zeros will be added to the sum of attentions and it will have no impact on the final result.

l-k-11235 · 2024-09-27T07:51:12Z

eole/predict/beam_search.py

            # shape: (batch_size x beam_size, 1)
+            self._coverage = torch.cat(


Then in the following decoding, steps, the attentions granted by the current target tokens to the sources are retrieved with attn[:, :, : self._coverage.size(-1)], as self._coverage.size(-1) is in fact equal to N (only the N first attentions are kept for the calculation of the coverage). It appears that the attn tensor is “naturally” pruned when the decoding of one of the source sequences in the batch is complete. However, to ensure consistent coverage, only self.select_indices are retained on the first dimension.

l-k-11235 · 2024-09-27T07:52:40Z

eole/predict/beam_search.py

            cov_penalty = self.global_scorer.cov_penalty(
                self._coverage, beta=self.global_scorer.beta
            )
-            self.topk_scores -= cov_penalty.view(_B, self.beam_size).float()


The coverage penalty is then calculated. As it is negative by construction, it is added to the hypothesis probabilities.

The best hypotheses will be re-ranked by taking into account the penalty.

l-k-11235 · 2024-09-27T07:54:46Z

eole/predict/penalties.py

@@ -65,7 +65,7 @@ def coverage_wu(self, cov, beta=0.0):
        then the ``seq_len`` axis probably sums to (almost) 1.
        """

-        penalty = -torch.min(cov, cov.clone().fill_(1.0)).log().sum(-1)
+        penalty = torch.min(cov.sum(1), cov.clone().sum(1).fill_(1.0)).log().sum(1)


First the attentions granted by the target tokens to each source token are summed up with cov.sum(1) , the sums are compared to 1, and the log of the minimum is calculated. We get a tensor of size (beam_size x current_batch_size, N)
Then the logs logs are summed up for each source token with sum(1). We get a 1-dim tensor of size (beam_size x current_batch_size)
Finally it is multiplied by beta to get the penalty of each hypothesis.

l-k-11235 · 2024-09-27T08:00:22Z

eole/predict/beam_search.py

@@ -357,30 +356,34 @@ def advance(self, log_probs, attn):
        self.maybe_update_forbidden_tokens()

        if self.return_attention or self._cov_pen:
-            current_attn = attn[self.select_indices]


The current_attn is not used anymore for the calculation of the coverage.

l-k-11235 · 2024-09-27T08:02:23Z

eole/predict/beam_search.py

            if step == 1:
-                self.alive_attn = current_attn


It leads to an error so it is taken off. The alive attn is only used for the return attn path, so it is separated inside the coverage path.

That seems weird. Isn't it needed in update_finished and remove_finished_batches?

l-k-11235 · 2024-09-27T08:07:41Z

eole/predict/beam_search.py

-                self._coverage + attn, self.global_scorer.beta
-            ).view(_B, self.beam_size)
+            cov_penalty = self.global_scorer.cov_penalty(attn, self.global_scorer.beta)
+            self.topk_log_probs -= cov_penalty.view(_B, self.beam_size)


I don't have clearly understood this part. What do _stepwise_cov_pen and _prev_penalty mean ?

Here is what I gathered quickly looking at the code:

_stepwise_cov_pen is roughly the stepwise_penalty condition, which applies penalty at each step

eole/eole/config/inference.py

Lines 44 to 47 in 4a3d0dd

stepwise_penalty: bool = Field(

default=False,

description="Apply coverage penalty at every decoding step. Helpful for summary penalty.",

)

in that context, _prev_penalty is just the state at the previous step, so that we accumulate along

francoishernandez

Not sure to understand the underlying issue here. So not trivial to grasp the fixes either.
Can you elaborate ? (Error traces, weird behaviours encountered, etc.)
Maybe in top PR comment, to clarify the context, the issue faced and how this PR intends to fix it.

francoishernandez · 2024-10-02T09:30:22Z

eole/predict/beam_search.py

-                self._coverage + attn, self.global_scorer.beta
-            ).view(_B, self.beam_size)
+            cov_penalty = self.global_scorer.cov_penalty(attn, self.global_scorer.beta)
+            self.topk_log_probs -= cov_penalty.view(_B, self.beam_size)


Here is what I gathered quickly looking at the code:

_stepwise_cov_pen is roughly the stepwise_penalty condition, which applies penalty at each step

eole/eole/config/inference.py

Lines 44 to 47 in 4a3d0dd

stepwise_penalty: bool = Field(

default=False,

description="Apply coverage penalty at every decoding step. Helpful for summary penalty.",

)

in that context, _prev_penalty is just the state at the previous step, so that we accumulate along

francoishernandez · 2024-10-02T09:34:40Z

eole/predict/beam_search.py

            if step == 1:
-                self.alive_attn = current_attn


That seems weird. Isn't it needed in update_finished and remove_finished_batches?

l-k-11235 · 2024-10-04T13:32:51Z

Thanks for your review !
I hadn't understood the use of accumulated coverage, which consists of keeping a step-by-step table of the sum of attentions of each hypothesis in the source tokens, and which changes the penalty formula from that of the article... in the correction I use the table of attentions of each hypothesis token on the source, and I recalculate the coverage “from zero” at each step (using the formula of the article), which is not optimal. So I open another PR.

l-k-11235 added 2 commits September 25, 2024 11:24

fixed coverage_wu

b9fb98b

fixed unittests error

16d9f84

l-k-11235 changed the title ~~fixed coverage_wu~~ Fix coverage penalty (wu) Sep 25, 2024

avoid using alive_attn

c8cdc65

fixed issue with batch size > 1

4c2f9d0

l-k-11235 commented Sep 27, 2024

View reviewed changes

francoishernandez reviewed Oct 2, 2024

View reviewed changes

francoishernandez mentioned this pull request Oct 8, 2024

Fix coverage penalty #125

Closed

l-k-11235 closed this Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix coverage penalty (wu) #119

Fix coverage penalty (wu) #119

l-k-11235 commented Sep 25, 2024 •

edited

Loading

l-k-11235 commented Sep 25, 2024

l-k-11235 left a comment

l-k-11235 Sep 27, 2024 •

edited

Loading

l-k-11235 Sep 27, 2024

l-k-11235 Sep 27, 2024

l-k-11235 Sep 27, 2024 •

edited

Loading

l-k-11235 Sep 27, 2024

l-k-11235 Sep 27, 2024

l-k-11235 Sep 27, 2024

francoishernandez Oct 2, 2024

l-k-11235 Sep 27, 2024

francoishernandez Oct 2, 2024

francoishernandez left a comment

francoishernandez Oct 2, 2024

francoishernandez Oct 2, 2024

l-k-11235 commented Oct 4, 2024 •

edited

Loading

		# shape: (batch_size x beam_size, 1)
		self._coverage = torch.cat(

	stepwise_penalty: bool = Field(
	default=False,
	description="Apply coverage penalty at every decoding step. Helpful for summary penalty.",
	)

Fix coverage penalty (wu) #119

Fix coverage penalty (wu) #119

Conversation

l-k-11235 commented Sep 25, 2024 • edited Loading

l-k-11235 commented Sep 25, 2024

l-k-11235 left a comment

Choose a reason for hiding this comment

l-k-11235 Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-k-11235 Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

francoishernandez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

l-k-11235 commented Oct 4, 2024 • edited Loading

l-k-11235 commented Sep 25, 2024 •

edited

Loading

l-k-11235 Sep 27, 2024 •

edited

Loading

l-k-11235 Sep 27, 2024 •

edited

Loading

l-k-11235 commented Oct 4, 2024 •

edited

Loading