Do we need a config to change `padding_side='left` before the evaluation? #31672

gary-young · 2024-06-27T20:53:27Z

Feature request

I am trying to train a Llama model (a decoder-only model). I want to evaluate my model with not only the loss but also some generation-based metric. For example, my eval dataset could be a str as 1+2=, and I use the Seq2seqTrainer which provides the modified prediction step so I can get the prediction of the model in the EvalPrediction. Then I write my eval code in the function compute_metrics and provide it for the Seq2seqTrainer.

The problem is about the padding_side of the tokenizer. Because I need to train the model, the tokenizer should be right padding in training dataset. (Because it is the default setting of Llama.) However, when I try to evaluate the model, the tokenizer should be changed into left padding because I need my model to generate. I do not find a easy way to do this, unless I change the source code of the trainer (for example, the get_eval_dataloader method of the Trainer).

My questions are:

Is it correct way to evaluate a decoder-only model in a generation-based way? Should I use the Seq2seqTrainer or is there some other methods I have not found? (Is there an example doc?)
Can I just train a model with right padding but evaluate it with left padding? If not, how should I evaluate models like Llama?
If my evaluate process is correct, how can I change the padding_side as right at the begining of the evaluation and change it back to left after the evaluation? (I think if we have the seperated training_data_coallotor and test_data_coallotor, the problem could be solved. Is it possbile for the current transformers Trainer? Or any other way to implement it?)

Motivation

Motivation: generation-based evaluation when we train a decoder-only autoregressive model like llama.

Your contribution

I do not know what I can help.

The text was updated successfully, but these errors were encountered:

zucchini-nlp · 2024-06-28T05:27:01Z

Hey!

Compute metrics with generation for decoder-only models does not work currently. See #26474 and the linked issues requesting the feature.

I am planing to work on it next week :)

gary-young · 2024-06-29T08:55:50Z

@zucchini-nlp Thank you! Now I implement it by:

remove the answer part in the valid (and test) dataset,
use the Seq2seqTrainer instead of the Trainer, which modified the prediction_step function and their output contain the prediction. (It is tricky because it replace the original logits with the generated token_ids.)
Then I get the prediciton and calculate my metrics in my own compute_metrics function.

From now on, it seems work. Now I solve the padding_side problem by also modifying the get_test_dataloader and get_eval_dataloader function to change the dataloader (more specifically, the data_coallator function.)

I am not sure it is the correct way to implement generation-based evaluation but it seems work.

gary-young · 2024-06-29T08:58:05Z

@zucchini-nlp Oh, but my implement has a new problem, because the input sequences have been manually trancated, the eval_loss does not make sense.

AaronZLT · 2024-07-01T10:16:25Z

same problem, idk if it is the correct way to pad the training data also the eval data to the left universally for any model and any dataset, with pad_token = unk_token (i.e., not eos_token). I see many recent works they do like this.

zucchini-nlp · 2024-07-08T11:36:04Z

Padding side when training should be right-side. Shorter texts will be padded in the right with pad tokens, and their labels masked out so that loss is not calculated over pad tokens.

Regarding the evaluation with Seq2seqTrainer, that is indeed tricky and would not make sense when used to track loss and a custom metrics based on generated text. I would suggest to train without generating for now, unless you've found a workaround. I will work on adding the feature soon

gary-young added the Feature request Request for a new feature label Jun 27, 2024

zucchini-nlp linked a pull request Jul 31, 2024 that will close this issue

Trainer: add predict with generate #32346

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do we need a config to change `padding_side='left` before the evaluation? #31672

Do we need a config to change `padding_side='left` before the evaluation? #31672

gary-young commented Jun 27, 2024

zucchini-nlp commented Jun 28, 2024

gary-young commented Jun 29, 2024

gary-young commented Jun 29, 2024

AaronZLT commented Jul 1, 2024

zucchini-nlp commented Jul 8, 2024

Do we need a config to change padding_side='left before the evaluation? #31672

Do we need a config to change padding_side='left before the evaluation? #31672

Comments

gary-young commented Jun 27, 2024

Feature request

Motivation

Your contribution

zucchini-nlp commented Jun 28, 2024

gary-young commented Jun 29, 2024

gary-young commented Jun 29, 2024

AaronZLT commented Jul 1, 2024

zucchini-nlp commented Jul 8, 2024

Do we need a config to change `padding_side='left` before the evaluation? #31672

Do we need a config to change `padding_side='left` before the evaluation? #31672