Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about Fine-tuning with Specialized Model Variants for Open-R1 #522

Open
doublelei opened this issue Mar 19, 2025 · 0 comments
Open

Comments

@doublelei
Copy link

Background

I'm working on fine-tuning language models for a specialized classification task and noticed the project has results for distilling DeepSeek-R1 on Qwen-2.5 base models, but not on specialized variants like Qwen-2.5-coder-instruct.

Questions

1. Specialized model variants for task-specific fine-tuning

For my task which involves code analysis, I'm considering using Qwen-2.5-coder-instruct instead of the standard Qwen-2.5-instruct.

  • Has anyone tested distillation or fine-tuning with specialized models like Qwen-2.5-coder?
  • Do specialized base models (like coder variants) show better performance for domain-specific tasks after distillation?

2. LoRA vs full fine-tuning

I see the project uses full fine-tuning for the distilled models:

  • Has anyone compared LoRA approaches against full fine-tuning in this context?
  • What performance differences might we expect?

3. System prompts in different training strategies

I've noticed differences in how system prompts are handled:

  • Why aren't system prompts needed for SFT distillation?
  • Is this a general principle or specific to this implementation?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant