Questions about Fine-tuning with Specialized Model Variants for Open-R1 #522

doublelei · 2025-03-19T07:06:51Z

Background

I'm working on fine-tuning language models for a specialized classification task and noticed the project has results for distilling DeepSeek-R1 on Qwen-2.5 base models, but not on specialized variants like Qwen-2.5-coder-instruct.

Questions

1. Specialized model variants for task-specific fine-tuning

For my task which involves code analysis, I'm considering using Qwen-2.5-coder-instruct instead of the standard Qwen-2.5-instruct.

Has anyone tested distillation or fine-tuning with specialized models like Qwen-2.5-coder?
Do specialized base models (like coder variants) show better performance for domain-specific tasks after distillation?

2. LoRA vs full fine-tuning

I see the project uses full fine-tuning for the distilled models:

Has anyone compared LoRA approaches against full fine-tuning in this context?
What performance differences might we expect?

3. System prompts in different training strategies

I've noticed differences in how system prompts are handled:

Why aren't system prompts needed for SFT distillation?
Is this a general principle or specific to this implementation?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about Fine-tuning with Specialized Model Variants for Open-R1 #522

Questions about Fine-tuning with Specialized Model Variants for Open-R1 #522

doublelei commented Mar 19, 2025

Questions about Fine-tuning with Specialized Model Variants for Open-R1 #522

Questions about Fine-tuning with Specialized Model Variants for Open-R1 #522

Comments

doublelei commented Mar 19, 2025

Background

Questions

1. Specialized model variants for task-specific fine-tuning

2. LoRA vs full fine-tuning

3. System prompts in different training strategies