Prompt tuning V2 with BloomZ #22

phalexo · 2024-07-28T13:07:21Z

phalexo
Jul 28, 2024

I am not totally certain if it is an issue or not.

The training data from the csv file contains two columns "act" and "prompt" but before this data gets tokenized and passed in for training, the columns are merged into a single column. Why?

I thought one should have "inputs" i.e. the initial short prompt "act as a trainer" and then a separate "labels" column which is the detailed output prompt.

Could you explain the rationale for merging the two columns? Like you see below? How is the training happening if you have a SINGLE input?

def concatenate_columns_prompt(dataset):
    def concatenate(example):
        example['prompt'] = "Act as a {}. Prompt: {}".format(example['act'], example['prompt'])
        return example

    dataset = dataset.map(concatenate)
    return dataset

Thanks.

peremartra · 2024-07-29T15:40:04Z

peremartra
Jul 29, 2024
Maintainer

Hi @phalexo you are right it is not and issue, but there is a lack of explanations in the notebook, i will try to improve it. And maybe do some different samples.

As I'm using the peft library some work is done inside the libraries, In this case the function DataCollatyorForLanguageModeling helps to prepare the data to train the model.

And you need to take in consideration that with a Transformer architecture in text generation the output is the next word generated. For each word generated the input is the sequence of words preceding it. (change word for tokens).

Hope it helps to clarify how it is working, if not, don't hesitate to ask again.

1 reply

phalexo Jul 30, 2024
Author

Hi @phalexo you are right it is not and issue, but there is a lack of explanations in the notebook, i will try to improve it. And maybe do some different samples.

As I'm using the peft library some work is done inside the libraries, In this case the function DataCollatyorForLanguageModeling helps to prepare the data to train the model.

And you need to take in consideration that with a Transformer architecture in text generation the output is the next word generated. For each word generated the input is the sequence of words preceding it. (change word for tokens).

Hope it helps to clarify how it is working, if not, don't hesitate to ask again.

So, input[input_ids] and input[labels] are set up to be identical?

Which part of the software is responsible for shifting input[labels] by one token from input[input_ids]? Does the class Trainer do that?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt tuning V2 with BloomZ #22

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Prompt tuning V2 with BloomZ #22

phalexo Jul 28, 2024

Replies: 1 comment · 1 reply

peremartra Jul 29, 2024 Maintainer

phalexo Jul 30, 2024 Author

phalexo
Jul 28, 2024

Replies: 1 comment 1 reply

peremartra
Jul 29, 2024
Maintainer

phalexo Jul 30, 2024
Author