-
Notifications
You must be signed in to change notification settings - Fork 26.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do we need a config to change padding_side='left
before the evaluation?
#31672
Comments
Hey! Compute metrics with generation for decoder-only models does not work currently. See #26474 and the linked issues requesting the feature. I am planing to work on it next week :) |
@zucchini-nlp Thank you! Now I implement it by:
From now on, it seems work. Now I solve the padding_side problem by also modifying the I am not sure it is the correct way to implement generation-based evaluation but it seems work. |
@zucchini-nlp Oh, but my implement has a new problem, because the input sequences have been manually trancated, the eval_loss does not make sense. |
same problem, idk if it is the correct way to pad the training data also the eval data to the left universally for any model and any dataset, with pad_token = unk_token (i.e., not eos_token). I see many recent works they do like this. |
Padding side when training should be right-side. Shorter texts will be padded in the right with pad tokens, and their labels masked out so that loss is not calculated over pad tokens. Regarding the evaluation with Seq2seqTrainer, that is indeed tricky and would not make sense when used to track loss and a custom metrics based on generated text. I would suggest to train without generating for now, unless you've found a workaround. I will work on adding the feature soon |
Feature request
I am trying to train a Llama model (a decoder-only model). I want to evaluate my model with not only the loss but also some generation-based metric. For example, my eval dataset could be a str as
1+2=
, and I use the Seq2seqTrainer which provides the modified prediction step so I can get the prediction of the model in theEvalPrediction
. Then I write my eval code in the functioncompute_metrics
and provide it for the Seq2seqTrainer.The problem is about the padding_side of the tokenizer. Because I need to train the model, the tokenizer should be right padding in training dataset. (Because it is the default setting of Llama.) However, when I try to evaluate the model, the tokenizer should be changed into left padding because I need my model to generate. I do not find a easy way to do this, unless I change the source code of the trainer (for example, the
get_eval_dataloader
method of the Trainer).My questions are:
Motivation
Motivation: generation-based evaluation when we train a decoder-only autoregressive model like llama.
Your contribution
I do not know what I can help.
The text was updated successfully, but these errors were encountered: