Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

无法在生成的 generated_predictions.jsonl 中保留额外字段并丢失 <image> 标记 #6070

Open
1 task done
enerai opened this issue Nov 19, 2024 · 0 comments
Open
1 task done
Labels
pending This problem is yet to be addressed

Comments

@enerai
Copy link

enerai commented Nov 19, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

...

Reproduction

问题描述

在使用批量预测指令时,如何能够将输入数据中的 extrainfo1extrainfo2 字段保留到 generated_predictions.jsonl 文件中?此外,我发现输出的 prompt 字段中未包含输入中的 <image> 标记。

使用指令

torchrun ${DISTRIBUTED_ARGS} src/train.py \
    --stage sft \
    --do_predict \
    --predict_with_generate \
    --use_fast_tokenizer \
    --flash_attn auto \
    --model_name_or_path ${MODEL_NAME_OR_PATH} \
    --eval_dataset ${eval_dataset} \
    --output_dir $OUTPUT_PATH \
    --template qwen2_vl \
    --finetuning_type full \
    --do_sample False \
    --max_new_tokens 4 \
    --repetition_penalty 1 \
    --length_penalty 1 \
    --num_beams 1 \
    --overwrite_cache \
    --overwrite_output_dir \
    --per_device_eval_batch_size 2 \
    --ddp_timeout 9000 \
    --logging_steps 1 \
    --cutoff_len 4096 \
    --bf16

输入数据格式

每行数据如下:

{
  "messages": [
    {"content": "...", "role": "user"},
    {"content": "...", "role": "assistant"}
  ],
  "images": [],
  "extrainfo1": "...",
  "extrainfo2": "..."
}

期望输出

希望在 generated_predictions.jsonl 文件中保留 extrainfo1extrainfo2 字段,生成的字段应包括:

{
  "prompt": "...",
  "label": "...",
  "predict": "...",
  "extrainfo1": "...",
  "extrainfo2": "..."
}

目前行为

  • generated_predictions.jsonl 中缺少 extrainfo1extrainfo2 字段。
  • prompt 字段中丢失了 <image> 标记。

相关问题

  • <image> 标记是否应出现在 prompt 字段中?
  • 是否有其他方法能够保留额外的字段?

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant