You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your appreciation! Existing MLLMs share similar structures, so our framework can be compatible with them after some modifications.
As UFO supports both InternVL2 and LLaVA-1.5, you can determine which parts need to be modified by comparing the relevant files of the two models, such as ufo_llava.py and ufo_internvl.py.
Key differences are in three files. Taking InternVL2 as an example:
ufo_internvl.py.: The main class of the model, responsible for initialization, wrapping LoRA, and the forward pass.
To ensure compatibility with qwen2.5-vl or gemma3, you need to create these three files. Each model mainly differs in the following four areas, which need to be modified:
Module Names: For example, in LLaVA-1.5, the text embedding is called embed_tokens, whereas in InternVL2 it is called tok_embeddings.
Forward Parameters: Different LLMs may have varying formats for KV cache and position_ids.
Since our code is mainly based on mmdetection, we have not encapsulated the model details very well. We apologize for this. If you encounter any issues during the implementation, feel free to ask at any time 😊.
非常棒的工作,请问可以支持 Qwen2.5-vl 或者 gemma3 吗?可以指导下如果要支持这些模型,需要做哪些修改吗?
The text was updated successfully, but these errors were encountered: