可以支持 Qwen2.5-vl 或者 gemma3 吗？ #3

thesby · 2025-03-12T16:23:23Z

非常棒的工作，请问可以支持 Qwen2.5-vl 或者 gemma3 吗？可以指导下如果要支持这些模型，需要做哪些修改吗？

nnnth · 2025-03-13T04:11:37Z

Thanks for your appreciation! Existing MLLMs share similar structures, so our framework can be compatible with them after some modifications.

As UFO supports both InternVL2 and LLaVA-1.5, you can determine which parts need to be modified by comparing the relevant files of the two models, such as ufo_llava.py and ufo_internvl.py.

Key differences are in three files. Taking InternVL2 as an example:

ufo_internvl.py.: The main class of the model, responsible for initialization, wrapping LoRA, and the forward pass.
ufo_internvl_det_head.py: A parameter-free task head for pre- and post-processing.
internvl2_8b_instruction_12w.py: The configuration for training and inference.

To ensure compatibility with qwen2.5-vl or gemma3, you need to create these three files. Each model mainly differs in the following four areas, which need to be modified:

Module Names: For example, in LLaVA-1.5, the text embedding is called embed_tokens, whereas in InternVL2 it is called tok_embeddings.
Special Tokens: Each model has a unique tokenizer with different end tokens and mask token IDs.
Conversation Template: System prompts are hardcoded in each task head, and VQA tasks require preprocessing of multi-turn dialogues.
Forward Parameters: Different LLMs may have varying formats for KV cache and position_ids.

Since our code is mainly based on mmdetection, we have not encapsulated the model details very well. We apologize for this. If you encounter any issues during the implementation, feel free to ask at any time 😊.

nnnth closed this as completed Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

可以支持 Qwen2.5-vl 或者 gemma3 吗？ #3

可以支持 Qwen2.5-vl 或者 gemma3 吗？ #3

thesby commented Mar 12, 2025

nnnth commented Mar 13, 2025

可以支持 Qwen2.5-vl 或者 gemma3 吗？ #3

可以支持 Qwen2.5-vl 或者 gemma3 吗？ #3

Comments

thesby commented Mar 12, 2025

nnnth commented Mar 13, 2025