Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen/Qwen2.5-VL-7B-Instruct PPO 训练报错 #7159

Open
1 task done
ulovecode opened this issue Mar 5, 2025 · 3 comments
Open
1 task done

Qwen/Qwen2.5-VL-7B-Instruct PPO 训练报错 #7159

ulovecode opened this issue Mar 5, 2025 · 3 comments
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@ulovecode
Copy link

ulovecode commented Mar 5, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

[rank0]: Traceback (most recent call last):
[rank0]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 23, in <module>
[rank0]:     launch()
[rank0]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 19, in launch
[rank0]:     run_exp()
[rank0]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 93, in run_exp
[rank0]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank0]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 71, in _training_function
[rank0]:     run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank0]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/workflow.py", line 72, in run_ppo
[rank0]:     ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank0]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/trainer.py", line 240, in ppo_train
[rank0]:     batch = next(dataiter)
[rank0]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/accelerate/data_loader.py", line 552, in __iter__
[rank0]:     current_batch = next(dataloader_iter)
[rank0]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
[rank0]:     data = self._next_data()
[rank0]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
[rank0]:     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
[rank0]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank0]:     return self.collate_fn(data)
[rank0]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/data/collator.py", line 150, in __call__
[rank0]:     features[0]["labels"] = [IGNORE_INDEX] * len(fake_input_ids) + features[0]["labels"]
[rank0]: KeyError: 'labels'
[rank2]: Traceback (most recent call last):
[rank2]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 23, in <module>
[rank2]:     launch()
[rank2]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 19, in launch
[rank2]:     run_exp()
[rank2]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 93, in run_exp
[rank2]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank2]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 71, in _training_function
[rank2]:     run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank2]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/workflow.py", line 72, in run_ppo
[rank2]:     ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank2]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/trainer.py", line 240, in ppo_train
[rank2]:     batch = next(dataiter)
[rank2]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/accelerate/data_loader.py", line 552, in __iter__
[rank2]:     current_batch = next(dataloader_iter)
[rank2]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
[rank2]:     data = self._next_data()
[rank2]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
[rank2]:     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
[rank2]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank2]:     return self.collate_fn(data)
[rank2]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/data/collator.py", line 150, in __call__
[rank2]:     features[0]["labels"] = [IGNORE_INDEX] * len(fake_input_ids) + features[0]["labels"]
[rank2]: KeyError: 'labels'
[rank3]: Traceback (most recent call last):
[rank3]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 23, in <module>
[rank3]:     launch()
[rank3]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 19, in launch
[rank3]:     run_exp()
[rank3]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 93, in run_exp
[rank3]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank3]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 71, in _training_function
[rank3]:     run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank3]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/workflow.py", line 72, in run_ppo
[rank3]:     ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank3]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/trainer.py", line 240, in ppo_train
[rank3]:     batch = next(dataiter)
[rank3]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/accelerate/data_loader.py", line 552, in __iter__
[rank3]:     current_batch = next(dataloader_iter)
[rank3]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
[rank3]:     data = self._next_data()
[rank3]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
[rank3]:     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
[rank3]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank3]:     return self.collate_fn(data)
[rank3]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/data/collator.py", line 150, in __call__
[rank3]:     features[0]["labels"] = [IGNORE_INDEX] * len(fake_input_ids) + features[0]["labels"]
[rank3]: KeyError: 'labels'
[rank1]: Traceback (most recent call last):
[rank1]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 23, in <module>
[rank1]:     launch()
[rank1]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py", line 19, in launch
[rank1]:     run_exp()
[rank1]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 93, in run_exp
[rank1]:     _training_function(config={"args": args, "callbacks": callbacks})
[rank1]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/tuner.py", line 71, in _training_function
[rank1]:     run_ppo(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
[rank1]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/workflow.py", line 72, in run_ppo
[rank1]:     ppo_trainer.ppo_train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank1]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/train/ppo/trainer.py", line 240, in ppo_train
[rank1]:     batch = next(dataiter)
[rank1]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/accelerate/data_loader.py", line 552, in __iter__
[rank1]:     current_batch = next(dataloader_iter)
[rank1]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in __next__
[rank1]:     data = self._next_data()
[rank1]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 764, in _next_data
[rank1]:     data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
[rank1]:   File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
[rank1]:     return self.collate_fn(data)
[rank1]:   File "/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/data/collator.py", line 150, in __call__
[rank1]:     features[0]["labels"] = [IGNORE_INDEX] * len(fake_input_ids) + features[0]["labels"]
[rank1]: KeyError: 'labels'
[rank0]:[W305 11:06:18.525879493 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
W0305 11:06:19.600000 1855643 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1855708 closing signal SIGTERM
W0305 11:06:19.601000 1855643 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1855709 closing signal SIGTERM
W0305 11:06:19.601000 1855643 site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 1855711 closing signal SIGTERM
E0305 11:06:20.066000 1855643 site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 2 (pid: 1855710) of binary: /root/anaconda3/envs/LLaMA-Factory-0.9.1/bin/python
Traceback (most recent call last):
  File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
    return f(*args, **kwargs)
  File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in main
    run(args)
  File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
    elastic_launch(
  File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/anaconda3/envs/LLaMA-Factory-0.9.1/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/data/zhubowen/LLaMA-Factory-0.9.1/src/llamafactory/launcher.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-03-05_11:06:19
  host      : wxhs-10.30.100.202
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 1855710)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Reproduction

Put your message here.

Others

启动命令

llamafactory-cli train \
    --stage ppo \
    --do_train True \
    --model_name_or_path Qwen/Qwen2.5-VL-7B-Instruct \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template qwen2_vl \
    --flash_attn auto \
    --dataset_dir data \
    --dataset post_score_train_data_v2 \
    --cutoff_len 2048 \
    --learning_rate 5e-05 \
    --num_train_epochs 3.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 1000 \
    --warmup_steps 0 \
    --packing False \
    --report_to none \
    --output_dir saves/Qwen2.5-VL-7B-Instruct/lora/train_2025-03-05-11-02-07 \
    --bf16 True \
    --plot_loss True \
    --trust_remote_code True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --optim adamw_torch \
    --lora_rank 128 \
    --lora_alpha 256 \
    --lora_dropout 0 \
    --lora_target all \
    --reward_model saves/Qwen2.5-VL-7B-Instruct/lora/train_2025-03-04-17-48-04 \
    --reward_model_type lora \
    --ppo_score_norm True \
    --ppo_whiten_rewards True \
    --top_k 0 \
    --top_p 0.9

dataset格式为

{
        "instruction": "### **图片质量评估与分类**\n\n#### **任务说明**  \n请根据以下评分标准和判断因素,对图片质量进行综合评估,并将其分类为 **1 到 5 级**。\n\n---\n\n### **评分标准**\n\n- **1 级(极差)**:图片质量极差,色彩失真、构图混乱、模糊不清,无观赏价值。\n- **2 级(较差)**:图片质量较差,通常为随手拍摄>或手机截图,存在明显瑕疵(如画面杂乱、色彩不协调、清晰度不足或构图欠佳),观赏价值较低。\n- **3 级(一般)**:图片质量一般,整体表现中规中矩,虽有一定亮点但存在明显缺陷,观赏价值有限。\n- **4 级(良好)**:图片质量较好,各方面表现较为均衡,细节清晰、构图合理,具有较高的观赏价值。\n- **5 级(优秀)**:图片质量出众>,无论是色彩、构图、清晰度还是创意,都表现卓越,观赏价值极高。\n\n---\n\n### **判断因素**\n\n在评估图片时,请综合考虑以下 5 个因素,并根据图片在各方面的表现进行描述性评价:\n\n1. **色彩搭配**:图片颜色是否鲜艳、和谐,是否能够吸引观众的眼球。\n2. **构图**:图片的主体是否突出,构图是否合理,背景是否干净整洁。\n3. **清晰度**:图片是否清晰、细节是否明确,有无模糊或失焦现象。\n4. **创意**:图片是否具有独特视角或创意,是否能给人留下深刻印象。\n5. **情感表达**:图片是否能传递情感或讲述故事,是否能引起观众共鸣。\n\n---\n\n### **评估步骤**\n\n1. **分别对每个因素进行评价**,可以给出 1-5 分的评分,也可以采用文字描述说明各因素的优缺点
。\n2. **结合各因素的表现,综合判断图片的整体质量**,给出最终的评价等级(1-5 级)。\n   - 可以采用简单的平均思路,但更重要的是结合各项表现给出合理的综合判断。\n3. **详细说明评估依据**,描述图片在各个判断因素上的表现以及最终评级的理由。\n\n---\n\n### **输出格式**\n\n请以以下 **JSON 结构** 输出评估结果:\n\n```json\n{\n  \"Thoughts\": \"<对图片各方面表现及最终评级依据的详细解释>\",\n  \"Category\": \"<1 / 2 / 3 / 4 / 5>\"\n}\n```\n\n---\n\n### **示例**\n\n假设某图片的评估情况如下:  \n- **色彩搭配**:色彩较为鲜明且和谐,但缺乏亮点。  \n- **构图**:构图合理,但主体稍显模糊,背景有些杂乱。  \n- **清晰度**:整体较为清晰,但部分
区域存在轻微模糊。  \n- **创意**:创意一般,没有特别独到的视角。  \n- **情感表达**:情感传递不够强烈,缺乏感染力。\n\n综合考虑各方面表现,最终认为该图片整体质量属于**3级(一般)**。\n\n**输出示例**:\n\n```json\n{\n  \"Thoughts\": \"该图片色彩较为和谐,但整体缺乏亮点;构图尚可,但主体不够突出且背景略显杂乱;清晰度
一般,局部存在模糊;创意及情感表达方面表现平平,未能激起强烈共鸣。\",\n  \"Category\": \"3\"\n}\n```\n\n---\n\n请按照以上说明,对图片进行综合评估与分类。",
        "input": "<image>",
        "output": "2",
        "images": [
            "images/post_score_image/Epk44nchif_resize.webp?imginfo=w3904,h2928"
        ]
    }
  "post_score_train_data_v2":{
          "file_name": "post_score_image_v2/train_data.json",
          "columns": {
                  "prompt": "instruction",
                  "query": "input",
                  "response": "output",
                  "images": "images"
          }
  },
@ulovecode ulovecode added bug Something isn't working pending This problem is yet to be addressed labels Mar 5, 2025
@jhealy1
Copy link

jhealy1 commented Mar 6, 2025

I am also able to reproduce with qwen2.5-vl-7b, using vllm==0.7.3

@hiyouga
Copy link
Owner

hiyouga commented Mar 6, 2025

We recommend using EasyR1 for RL, the RL implementation in LlamaFactory is temporarily bugged: https://github.com/hiyouga/EasyR1

@amoyplane
Copy link

We recommend using EasyR1 for RL, the RL implementation in LlamaFactory is temporarily bugged: https://github.com/hiyouga/EasyR1

请问还会跟进这个问题吗?

我们在尝试对Qwen2-vl进行ppo时遇到了相同问题,使用的reward model是自己单独训练的。
之前已经尝试过直接进行dpo训练,没有问题。

EasyR1暂时不支持lora还是比较头疼的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

4 participants