We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
this is my config_demo.yaml model_name_or_path: model/Qwen2.5-1.5B-Instruct model_revision: main torch_dtype: bfloat16
dataset_name: open-r1/OpenR1-Math-220k dataset_configs:
bf16: true do_eval: false eval_strategy: 'no' gradient_accumulation_steps: 1 gradient_checkpointing: True gradient_checkpointing_kwargs: use_reentrant: false hub_model_id: Qwen2.5-1.5B-Open-R1-Distill hub_strategy: every_save learning_rate: 5.0e-05 log_level: info logging_steps: 5 logging_strategy: steps lr_scheduler_type: cosine_with_min_lr lr_scheduler_kwargs: min_lr_rate: 0.1 packing: true max_seq_length: 16384 max_steps: -1 num_train_epochs: 1 output_dir: /model/Qwen2.5-1.5B-Open-R1-math220k-Distill-useliger_8gpu overwrite_output_dir: true per_device_eval_batch_size: 16 per_device_train_batch_size: 16 push_to_hub: False report_to:
ddp.yaml compute_environment: LOCAL_MACHINE debug: False distributed_type: MULTI_GPU downcast_bf16: 'no' gpu_ids: "0,1,2,3,4,5,6,7" machine_rank: 0 main_training_function: main mixed_precision: bf16 num_machines: 1 num_processes: 8 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false
loss can decrease at 0.61,but result on math-500 is lower than original model
The text was updated successfully, but these errors were encountered:
@HwangYej1 hi, I am getting a similar issue, how did you fix it?
Sorry, something went wrong.
不知道为什么,仍然没有复现成功,SFT后测试结果很差
No branches or pull requests
this is my config_demo.yaml
model_name_or_path: model/Qwen2.5-1.5B-Instruct
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2
Data training arguments
dataset_name: open-r1/OpenR1-Math-220k
dataset_configs:
dataset_num_proc: 48
SFT trainer config
bf16: true
do_eval: false
eval_strategy: 'no'
gradient_accumulation_steps: 1
gradient_checkpointing: True
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: Qwen2.5-1.5B-Open-R1-Distill
hub_strategy: every_save
learning_rate: 5.0e-05
log_level: info
logging_steps: 5
logging_strategy: steps
lr_scheduler_type: cosine_with_min_lr
lr_scheduler_kwargs:
min_lr_rate: 0.1
packing: true
max_seq_length: 16384
max_steps: -1
num_train_epochs: 1
output_dir: /model/Qwen2.5-1.5B-Open-R1-math220k-Distill-useliger_8gpu
overwrite_output_dir: true
per_device_eval_batch_size: 16
per_device_train_batch_size: 16
push_to_hub: False
report_to:
save_strategy: "steps"
save_steps: 100
save_total_limit: 1
seed: 42
use_liger: True
warmup_ratio: 0.05
ddp.yaml
compute_environment: LOCAL_MACHINE
debug: False
distributed_type: MULTI_GPU
downcast_bf16: 'no'
gpu_ids: "0,1,2,3,4,5,6,7"
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
loss can decrease at 0.61,but result on math-500 is lower than original model
The text was updated successfully, but these errors were encountered: