Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 'Invalidate trace cache' with Seq2SeqTrainer+predict_with_generate+Zero3 #5662

Open
Osterlohe opened this issue Jun 14, 2024 · 0 comments
Assignees
Labels
bug Something isn't working inference

Comments

@Osterlohe
Copy link

Describe the bug
Evaluating transformers Seq2SeqTrainer with 'predict_with_generate=True' results in 'Invalidate trace cache' warnings.
The warnings appear inside the prediction_step of the Seq2SeqTrainer. Twice during each prediction_step:
Here: generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
and here: outputs = model(**inputs)
The error messages:

Invalidate trace cache @ step 0: expected module 2, but got module 0
Invalidate trace cache @ step 1: expected module 464, but got module 2

Call Stack:
MyTrainer.evaluate->Trainer.evaluate->Trainer.evaluation_loop->Seq2SeqTrainer.prediction_step->'Invalidate Trace Cache'

To Reproduce
I built a simple script to reproduce the error. A little bit of background first:
Seq2SeqTrainer.prediction_step has a small check at the beginning:

if not self.args.predict_with_generate or prediction_loss_only:
            return super().prediction_step(
                model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys
            )

This means that (1) predict_with_generate=True needs to be in the training args, but also (2) prediction_loss_only needs to be None or False. Otherwise we wouldn't acutally predict with generate. prediction_loss_only will be automaticly set to True by trainer.evaluate when there are no compute_metrics. Thats why compute_metrics are included in the script. Note: We could also subclass the Seq2SeqTrainer.prediction_step and pass prediction_loss_only=False on to the superclass for testing purposes.

I run the script like this:
deepspeed --include localhost:1 main_trainer_simple.py

main_trainer_simple.py:

import numpy as np
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer
from datasets import load_dataset, load_metric

def main():
    # Defining training arguments
    training_args = Seq2SeqTrainingArguments(
        output_dir="/modelSave",
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        predict_with_generate=True,
        bf16=True,
        fp16=False,
        do_train = False,
        do_eval = True,
        logging_dir='/logging',
        learning_rate=3e-05,
        weight_decay=0.01,
        deepspeed="ds_stage3_simple.json",
        generation_max_length=128,
        generation_num_beams=1
    )
    # Defining the model:
    model = "bigscience/T0_3B"
    #model = "facebook/bart-large-cnn"

    # Initialize the tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model)

    #Loading a dataset and creating an eval_dataset
    dataset = load_dataset("cnn_dailymail", "3.0.0")
    def preprocess_function(examples):
        inputs = ["summarize: " + doc for doc in examples["article"]]
        targets = examples["highlights"]
        model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")
        labels = tokenizer(targets, max_length=128, truncation=True, padding="max_length").input_ids
        model_inputs["labels"] = labels
        return model_inputs
    dataset = dataset.map(preprocess_function, batched=True)
    dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
    eval_dataset = dataset["validation"].select(range(10))

    # compute_metrics generated by ChatGPT. They will be called after the warnings.
    def compute_metrics(eval_preds):
        preds, labels = eval_preds
        if isinstance(preds, tuple):
            preds = preds[0]

        # Ensure preds are within the valid range of token IDs
        # Remove any values that are -100 or less
        preds = np.where(preds < 0, tokenizer.pad_token_id, preds)
        preds = preds.clip(0, tokenizer.vocab_size - 1)

        # Decode predictions and labels
        decoded_preds = tokenizer.batch_decode(preds, skip_special_tokens=True)

        # Ensure labels have valid token IDs
        labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
        decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

        # Compute ROUGE scores
        rouge = load_metric("rouge")
        result = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)

        # Extract the individual ROUGE scores
        result = {key: value.mid.fmeasure * 100 for key, value in result.items()}

        # Add mean generated length
        prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in preds]
        result["gen_len"] = np.mean(prediction_lens)

        return result


    # Initializing model and trainer
    model = AutoModelForSeq2SeqLM.from_pretrained(model)
    trainer = Seq2SeqTrainer(
        model=model,
        args=training_args,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics
    )

    # Evaluating eval_dataset
    results = trainer.evaluate()
    print("Printing results:")
    print(results)

if __name__ == "__main__":
    main()

ds_config3_simple.json:

{
    "bf16":{
        "enabled":"auto"
    },
    "fp16": {
        "enabled": "auto"
    },
    "zero_optimization": {
        "stage": 3,
        "overlap_comm": true,
        "contiguous_gradients": true,
        "sub_group_size": 1e9,
        "reduce_bucket_size": "auto",
        "stage3_prefetch_bucket_size": "auto",
        "stage3_param_persistence_threshold": "auto",
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": true
    },
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto"
}

Expected behavior
I excpected no warnings. The problem also slows down execution. It is currently faster not to use deepspeed.

ds_report output

[2024-06-14 16:46:59,976] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] please install triton==1.0.0 if you want to use sparse attention

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fp_quantizer ........... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]

DeepSpeed general environment info:
torch install path ............... ['/XXX/miniconda3/envs/seq2seqnew2/lib/python3.12/site-packages/torch']
torch version .................... 2.3.1+cu121
deepspeed install path ........... ['/XXX/miniconda3/envs/seq2seqnew2/lib/python3.12/site-packages/deepspeed']
deepspeed info ................... 0.14.3, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.5
deepspeed wheel compiled w. ...... torch 2.3, cuda 12.5
shared memory (/dev/shm) size .... 503.86 GB

Screenshots
invalidateTraceCache

System info (please complete the following information):
OS: Ubuntu 22.04.4 LTS
GPU: 3x RTX A6000 (no difference between single or multi-gpu)
Python version: 3.12.2 | packaged by conda-forge
Transformers version: 4.41.2
Datasets version: 2.20.0
Numpy version: 1.26.4
DeepSpeed version: 0.14.3
Torch version: 2.3.1+cu121
-> All packages are installed within a conda env

Additional context
I also read about deepspeed fastgen/mii, but there is no support for T5 - the model im currently using - yet.

@Osterlohe Osterlohe added bug Something isn't working inference labels Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inference
Projects
None yet
Development

No branches or pull requests

2 participants