You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Evaluating transformers Seq2SeqTrainer with 'predict_with_generate=True' results in 'Invalidate trace cache' warnings.
The warnings appear inside the prediction_step of the Seq2SeqTrainer. Twice during each prediction_step:
Here: generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
and here: outputs = model(**inputs)
The error messages:
To Reproduce
I built a simple script to reproduce the error. A little bit of background first:
Seq2SeqTrainer.prediction_step has a small check at the beginning:
if not self.args.predict_with_generate or prediction_loss_only:
return super().prediction_step(
model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys
)
This means that (1) predict_with_generate=True needs to be in the training args, but also (2) prediction_loss_only needs to be None or False. Otherwise we wouldn't acutally predict with generate. prediction_loss_only will be automaticly set to True by trainer.evaluate when there are no compute_metrics. Thats why compute_metrics are included in the script. Note: We could also subclass the Seq2SeqTrainer.prediction_step and pass prediction_loss_only=False on to the superclass for testing purposes.
I run the script like this:
deepspeed --include localhost:1 main_trainer_simple.py
main_trainer_simple.py:
import numpy as np
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
from datasets import load_dataset, load_metric
def main():
# Defining training arguments
training_args = Seq2SeqTrainingArguments(
output_dir="/modelSave",
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
predict_with_generate=True,
bf16=True,
fp16=False,
do_train = False,
do_eval = True,
logging_dir='/logging',
learning_rate=3e-05,
weight_decay=0.01,
deepspeed="ds_stage3_simple.json",
generation_max_length=128,
generation_num_beams=1
)
# Defining the model:
model = "bigscience/T0_3B"
#model = "facebook/bart-large-cnn"
# Initialize the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model)
#Loading a dataset and creating an eval_dataset
dataset = load_dataset("cnn_dailymail", "3.0.0")
def preprocess_function(examples):
inputs = ["summarize: " + doc for doc in examples["article"]]
targets = examples["highlights"]
model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")
labels = tokenizer(targets, max_length=128, truncation=True, padding="max_length").input_ids
model_inputs["labels"] = labels
return model_inputs
dataset = dataset.map(preprocess_function, batched=True)
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
eval_dataset = dataset["validation"].select(range(10))
# compute_metrics generated by ChatGPT. They will be called after the warnings.
def compute_metrics(eval_preds):
preds, labels = eval_preds
if isinstance(preds, tuple):
preds = preds[0]
# Ensure preds are within the valid range of token IDs
# Remove any values that are -100 or less
preds = np.where(preds < 0, tokenizer.pad_token_id, preds)
preds = preds.clip(0, tokenizer.vocab_size - 1)
# Decode predictions and labels
decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)
# Ensure labels have valid token IDs
labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
# Compute ROUGE scores
rouge = load_metric("rouge")
result = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
# Extract the individual ROUGE scores
result = {key: value.mid.fmeasure * 100 for key, value in result.items()}
# Add mean generated length
prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
result["gen_len"] = np.mean(prediction_lens)
return result
# Initializing model and trainer
model = AutoModelForSeq2SeqLM.from_pretrained(model)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
eval_dataset=eval_dataset,
compute_metrics=compute_metrics
)
# Evaluating eval_dataset
results = trainer.evaluate()
print("Printing results:")
print(results)
if __name__ == "__main__":
main()
Expected behavior
I excpected no warnings. The problem also slows down execution. It is currently faster not to use deepspeed.
ds_report output
[2024-06-14 16:46:59,976] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] please install triton==1.0.0 if you want to use sparse attention
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
DeepSpeed general environment info:
torch install path ............... ['/XXX/miniconda3/envs/seq2seqnew2/lib/python3.12/site-packages/torch']
torch version .................... 2.3.1+cu121
deepspeed install path ........... ['/XXX/miniconda3/envs/seq2seqnew2/lib/python3.12/site-packages/deepspeed']
deepspeed info ................... 0.14.3, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.5
deepspeed wheel compiled w. ...... torch 2.3, cuda 12.5
shared memory (/dev/shm) size .... 503.86 GB
Screenshots
System info (please complete the following information):
OS: Ubuntu 22.04.4 LTS
GPU: 3x RTX A6000 (no difference between single or multi-gpu)
Python version: 3.12.2 | packaged by conda-forge
Transformers version: 4.41.2
Datasets version: 2.20.0
Numpy version: 1.26.4
DeepSpeed version: 0.14.3
Torch version: 2.3.1+cu121
-> All packages are installed within a conda env
Additional context
I also read about deepspeed fastgen/mii, but there is no support for T5 - the model im currently using - yet.
The text was updated successfully, but these errors were encountered:
Describe the bug
Evaluating transformers Seq2SeqTrainer with 'predict_with_generate=True' results in 'Invalidate trace cache' warnings.
The warnings appear inside the prediction_step of the Seq2SeqTrainer. Twice during each prediction_step:
Here:
generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
and here:
outputs = model(**inputs)
The error messages:
Call Stack:
MyTrainer.evaluate->Trainer.evaluate->Trainer.evaluation_loop->Seq2SeqTrainer.prediction_step->'Invalidate Trace Cache'
To Reproduce
I built a simple script to reproduce the error. A little bit of background first:
Seq2SeqTrainer.prediction_step has a small check at the beginning:
This means that (1) predict_with_generate=True needs to be in the training args, but also (2) prediction_loss_only needs to be None or False. Otherwise we wouldn't acutally predict with generate. prediction_loss_only will be automaticly set to True by trainer.evaluate when there are no compute_metrics. Thats why compute_metrics are included in the script. Note: We could also subclass the Seq2SeqTrainer.prediction_step and pass prediction_loss_only=False on to the superclass for testing purposes.
I run the script like this:
deepspeed --include localhost:1 main_trainer_simple.py
main_trainer_simple.py:
ds_config3_simple.json:
Expected behavior
I excpected no warnings. The problem also slows down execution. It is currently faster not to use deepspeed.
ds_report output
[2024-06-14 16:46:59,976] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] please install triton==1.0.0 if you want to use sparse attention
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
ninja .................. [OKAY]
op name ................ installed .. compatible
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fp_quantizer ........... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
DeepSpeed general environment info:
torch install path ............... ['/XXX/miniconda3/envs/seq2seqnew2/lib/python3.12/site-packages/torch']
torch version .................... 2.3.1+cu121
deepspeed install path ........... ['/XXX/miniconda3/envs/seq2seqnew2/lib/python3.12/site-packages/deepspeed']
deepspeed info ................... 0.14.3, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.5
deepspeed wheel compiled w. ...... torch 2.3, cuda 12.5
shared memory (/dev/shm) size .... 503.86 GB
Screenshots
System info (please complete the following information):
OS: Ubuntu 22.04.4 LTS
GPU: 3x RTX A6000 (no difference between single or multi-gpu)
Python version: 3.12.2 | packaged by conda-forge
Transformers version: 4.41.2
Datasets version: 2.20.0
Numpy version: 1.26.4
DeepSpeed version: 0.14.3
Torch version: 2.3.1+cu121
-> All packages are installed within a conda env
Additional context
I also read about deepspeed fastgen/mii, but there is no support for T5 - the model im currently using - yet.
The text was updated successfully, but these errors were encountered: