Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

可以支持 Qwen2.5-vl 或者 gemma3 吗? #3

Closed
thesby opened this issue Mar 12, 2025 · 1 comment
Closed

可以支持 Qwen2.5-vl 或者 gemma3 吗? #3

thesby opened this issue Mar 12, 2025 · 1 comment

Comments

@thesby
Copy link

thesby commented Mar 12, 2025

非常棒的工作,请问可以支持 Qwen2.5-vl 或者 gemma3 吗?可以指导下如果要支持这些模型,需要做哪些修改吗?

@nnnth
Copy link
Owner

nnnth commented Mar 13, 2025

Thanks for your appreciation! Existing MLLMs share similar structures, so our framework can be compatible with them after some modifications.

As UFO supports both InternVL2 and LLaVA-1.5, you can determine which parts need to be modified by comparing the relevant files of the two models, such as ufo_llava.py and ufo_internvl.py.

Image

Key differences are in three files. Taking InternVL2 as an example:

To ensure compatibility with qwen2.5-vl or gemma3, you need to create these three files. Each model mainly differs in the following four areas, which need to be modified:

  1. Module Names: For example, in LLaVA-1.5, the text embedding is called embed_tokens, whereas in InternVL2 it is called tok_embeddings.
  2. Special Tokens: Each model has a unique tokenizer with different end tokens and mask token IDs.
  3. Conversation Template: System prompts are hardcoded in each task head, and VQA tasks require preprocessing of multi-turn dialogues.
  4. Forward Parameters: Different LLMs may have varying formats for KV cache and position_ids.

Since our code is mainly based on mmdetection, we have not encapsulated the model details very well. We apologize for this. If you encounter any issues during the implementation, feel free to ask at any time 😊.

@nnnth nnnth closed this as completed Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants