Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add support for Phi-3.5 MoE and Vision Instruct #9119

Closed
4 tasks done
YorkieDev opened this issue Aug 21, 2024 · 24 comments
Closed
4 tasks done

Feature Request: Add support for Phi-3.5 MoE and Vision Instruct #9119

YorkieDev opened this issue Aug 21, 2024 · 24 comments
Labels
enhancement New feature or request model Model specific stale

Comments

@YorkieDev
Copy link

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Microsoft has recently dropped two new models in the Phi Family.

3.5 MoE: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
3.5 Vision: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

It would be nice to see support added to llama.cpp for these two models.

Motivation

Supporting all model releases so the wider community can enjoy these great free models.

Possible Implementation

No response

@YorkieDev YorkieDev added the enhancement New feature or request label Aug 21, 2024
@curvedinf
Copy link

MoE looks promising. Any word on how complex it is to add support for?

@JackCloudman
Copy link

Is someone working on it? 🙏

@simsi-andy
Copy link

Especially vision would be worth it. But I lack the knowledge to do smth. like this.

@mounta11n
Copy link
Contributor

Yes, the vision model is surprisingly good. As a gguf format under llama.cpp, this would open up undreamt-of possibilities

Bildschirmfoto_20240821_213502

@foldl
Copy link
Contributor

foldl commented Aug 28, 2024

ChatLLM.cpp supports Phi-3.5 MoE model now.

For developers: MoE Sparse MLP is the same as a little different from the one used in Mixtral.

@ayttop
Copy link

ayttop commented Aug 28, 2024

https://github.com/foldl/chatllm.cpp

| Supported Models | Download Quantized Models |

What's New:

2024-08-28: Phi-3.5 Mini & MoE

Inference of a bunch of models from less than 1B to more than 300B, for real-time chatting with RAG on your computer (CPU), pure C++ implementation based on @ggerganov's ggml.

| Supported Models | Download Quantized Models |

What's New:

2024-08-28: Phi-3.5 Mini & MoE

@ayttop
Copy link

ayttop commented Aug 28, 2024

https://huggingface.co/microsoft/Phi-3.5-MoE-instruct/discussions/4

microsoft/Phi-3.5-MoE-instruct
convert to gguf
gguf

@Dampfinchen
Copy link

Pretty sad to see no support for Phi 3.5 MoE in llama.cpp. Sure, it might have dry writing and is very censored, but in assistant tasks it's much better than all the smaller models combined. It truly has 70B quality in just 6.6B active parameters so its much easier to run than even G2 27B (which it beats according to benchmarks).

@sourceholder
Copy link

@Dampfinchen, have you found any way to run Phi 3.5 MoE locally? I'm open to try out alternatives to llama.cpp.

@arnesund
Copy link

arnesund commented Sep 1, 2024

Also eager to get Phi 3.5-Vision support. Most accurate photo and screenshot descriptions I've seen so far.

@EricLBuehler
Copy link

@Dampfinchen @sourceholder @arnesund if you are interested in running Phi 3.5 MoE or Phi 3.5 vision with alternatives to llama.cpp, perhaps you could check out mistral.rs.

Just a quick description:

We have support for Phi 3.5 MoE (docs & example: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3.5MOE.md) and Phi 3.5 vision (docs & examples: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3V.md).

All models can be run with CUDA, Metal, or CPU SIMD acceleration. We have Flash Attention and PagedAttention support for increased inference performance, and support in-situ quantization in GGUF and HQQ formats.

If you are using the OpenAI API, you can use the provided OpenAI-compatible (superset, we have things like min-p, DRY, etc) HTTP server. There is also a Python package. For Phi 3.5 MoE and other text models, there is also an interactive chat mode.

@Dampfinchen
Copy link

Thank you, but I and many others rather wait for official support.

I wonder what's the holdup? Shouldn't it be possible to copy a lot of the code from Mixtral to Phi 3.5 MoE given they have a pretty similar architecture with two experts?

@Thellton
Copy link

Thellton commented Sep 3, 2024

Thank you, but I and many others rather wait for official support.

I wonder what's the holdup? Shouldn't it be possible to copy a lot of the code from Mixtral to Phi 3.5 MoE given they have a pretty similar architecture with two experts?

no one's taken the task up yet sadly. there's presently work being done on Phi-3.5 Vision Instruct though which is something to look forward to considering the reported vision understanding that the model has.

@ayttop
Copy link

ayttop commented Sep 4, 2024

phi-3.5-moe-instruct gguf lamacpp???????????????????????

@bunnyfu
Copy link

bunnyfu commented Sep 12, 2024

Bumping up thread. :)

@vaibhav1618
Copy link

vaibhav1618 commented Sep 16, 2024

Strange why no one is looking into this. MoE seems to be the best model currently that can run on consumer-grade CPU

@sourceholder
Copy link

@vaibhav1618, FYI - Deepseek V2 Lite (16B) is another good MoE model. 2.4B activated params.

@ThiloteE
Copy link
Contributor

Phi-3.5 MoE seems to be based on https://huggingface.co/microsoft/GRIN-MoE/tree/main. Maybe their technical report at https://arxiv.org/abs/2409.12136 can help at identifying differences to other MoE architectures, which should ease adoption in llama.cpp.

@yueshen-intel
Copy link

there's presently work being done on Phi-3.5 Vision Instruct though which is something to look forward to considering the reported vision understanding that the model has

I'm wondering where's the work being done on Phi-3.5 Vision Instruct ? Much thanks!

@limingchina
Copy link

limingchina commented Oct 20, 2024

@Dampfinchen @sourceholder @arnesund if you are interested in running Phi 3.5 MoE or Phi 3.5 vision with alternatives to llama.cpp, perhaps you could check out mistral.rs.

Just a quick description:

We have support for Phi 3.5 MoE (docs & example: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3.5MOE.md) and Phi 3.5 vision (docs & examples: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3V.md).

All models can be run with CUDA, Metal, or CPU SIMD acceleration. We have Flash Attention and PagedAttention support for increased inference performance, and support in-situ quantization in GGUF and HQQ formats.

If you are using the OpenAI API, you can use the provided OpenAI-compatible (superset, we have things like min-p, DRY, etc) HTTP server. There is also a Python package. For Phi 3.5 MoE and other text models, there is also an interactive chat mode.

@EricLBuehler , Can you recommend some frontend app to use mistral.rs?

@ThiloteE
Copy link
Contributor

ThiloteE commented Oct 25, 2024

The PR in the transformers repo to support Phi-3.5 MoE has been merged and is featured in release v4.46.0, so maybe finally llama.cpp can add this model architecture?

Oh and by the way, i just found the documentation for how to add a new model to llama.cpp, after having followed this repo for months now, lol. Here are the docs: https://github.com/ggerganov/llama.cpp/blob/master/docs/development/HOWTO-add-model.md

@sinand99
Copy link

+1 for MoE support.

@skylake5200
Copy link

Also eager to get Phi 3.5-Vision support. Most accurate photo and screenshot descriptions I've seen so far.

+1 for Phi 3.5-Vision support.

phymbert added a commit that referenced this issue Dec 28, 2024
phymbert added a commit that referenced this issue Dec 28, 2024
@phymbert phymbert added the model Model specific label Dec 28, 2024
@github-actions github-actions bot added the stale label Jan 28, 2025
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request model Model specific stale
Projects
None yet
Development

No branches or pull requests