Feature Request: Add support for Phi-3.5 MoE and Vision Instruct #9119

YorkieDev · 2024-08-21T14:32:40Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Microsoft has recently dropped two new models in the Phi Family.

3.5 MoE: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
3.5 Vision: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

It would be nice to see support added to llama.cpp for these two models.

Motivation

Supporting all model releases so the wider community can enjoy these great free models.

Possible Implementation

No response

curvedinf · 2024-08-22T06:32:03Z

MoE looks promising. Any word on how complex it is to add support for?

JackCloudman · 2024-08-23T19:26:24Z

Is someone working on it? 🙏

simsi-andy · 2024-08-24T09:43:02Z

Especially vision would be worth it. But I lack the knowledge to do smth. like this.

mounta11n · 2024-08-25T04:39:37Z

Yes, the vision model is surprisingly good. As a gguf format under llama.cpp, this would open up undreamt-of possibilities

foldl · 2024-08-28T08:59:06Z

ChatLLM.cpp supports Phi-3.5 MoE model now.

For developers: MoE Sparse MLP is ~~the same as~~ a little different from the one used in Mixtral.

ayttop · 2024-08-28T19:45:18Z

https://github.com/foldl/chatllm.cpp

| Supported Models | Download Quantized Models |

What's New:

2024-08-28: Phi-3.5 Mini & MoE

Inference of a bunch of models from less than 1B to more than 300B, for real-time chatting with RAG on your computer (CPU), pure C++ implementation based on @ggerganov's ggml.

| Supported Models | Download Quantized Models |

What's New:

2024-08-28: Phi-3.5 Mini & MoE

ayttop · 2024-08-28T19:47:11Z

https://huggingface.co/microsoft/Phi-3.5-MoE-instruct/discussions/4

microsoft/Phi-3.5-MoE-instruct
convert to gguf
gguf

Dampfinchen · 2024-08-29T18:28:45Z

Pretty sad to see no support for Phi 3.5 MoE in llama.cpp. Sure, it might have dry writing and is very censored, but in assistant tasks it's much better than all the smaller models combined. It truly has 70B quality in just 6.6B active parameters so its much easier to run than even G2 27B (which it beats according to benchmarks).

sourceholder · 2024-08-29T23:20:33Z

@Dampfinchen, have you found any way to run Phi 3.5 MoE locally? I'm open to try out alternatives to llama.cpp.

arnesund · 2024-09-01T18:16:11Z

Also eager to get Phi 3.5-Vision support. Most accurate photo and screenshot descriptions I've seen so far.

EricLBuehler · 2024-09-02T18:11:02Z

@Dampfinchen @sourceholder @arnesund if you are interested in running Phi 3.5 MoE or Phi 3.5 vision with alternatives to llama.cpp, perhaps you could check out mistral.rs.

Just a quick description:

We have support for Phi 3.5 MoE (docs & example: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3.5MOE.md) and Phi 3.5 vision (docs & examples: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3V.md).

All models can be run with CUDA, Metal, or CPU SIMD acceleration. We have Flash Attention and PagedAttention support for increased inference performance, and support in-situ quantization in GGUF and HQQ formats.

If you are using the OpenAI API, you can use the provided OpenAI-compatible (superset, we have things like min-p, DRY, etc) HTTP server. There is also a Python package. For Phi 3.5 MoE and other text models, there is also an interactive chat mode.

Dampfinchen · 2024-09-03T10:24:49Z

Thank you, but I and many others rather wait for official support.

I wonder what's the holdup? Shouldn't it be possible to copy a lot of the code from Mixtral to Phi 3.5 MoE given they have a pretty similar architecture with two experts?

Thellton · 2024-09-03T10:57:16Z

Thank you, but I and many others rather wait for official support.

I wonder what's the holdup? Shouldn't it be possible to copy a lot of the code from Mixtral to Phi 3.5 MoE given they have a pretty similar architecture with two experts?

no one's taken the task up yet sadly. there's presently work being done on Phi-3.5 Vision Instruct though which is something to look forward to considering the reported vision understanding that the model has.

ayttop · 2024-09-04T17:43:24Z

phi-3.5-moe-instruct gguf lamacpp???????????????????????

bunnyfu · 2024-09-12T11:44:43Z

Bumping up thread. :)

vaibhav1618 · 2024-09-16T01:47:40Z

Strange why no one is looking into this. MoE seems to be the best model currently that can run on consumer-grade CPU

sourceholder · 2024-09-16T01:57:47Z

@vaibhav1618, FYI - Deepseek V2 Lite (16B) is another good MoE model. 2.4B activated params.

ThiloteE · 2024-09-21T11:55:40Z

Phi-3.5 MoE seems to be based on https://huggingface.co/microsoft/GRIN-MoE/tree/main. Maybe their technical report at https://arxiv.org/abs/2409.12136 can help at identifying differences to other MoE architectures, which should ease adoption in llama.cpp.

yueshen-intel · 2024-10-03T18:12:36Z

there's presently work being done on Phi-3.5 Vision Instruct though which is something to look forward to considering the reported vision understanding that the model has

I'm wondering where's the work being done on Phi-3.5 Vision Instruct ? Much thanks!

limingchina · 2024-10-20T10:08:35Z

@Dampfinchen @sourceholder @arnesund if you are interested in running Phi 3.5 MoE or Phi 3.5 vision with alternatives to llama.cpp, perhaps you could check out mistral.rs.

Just a quick description:

We have support for Phi 3.5 MoE (docs & example: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3.5MOE.md) and Phi 3.5 vision (docs & examples: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3V.md).

All models can be run with CUDA, Metal, or CPU SIMD acceleration. We have Flash Attention and PagedAttention support for increased inference performance, and support in-situ quantization in GGUF and HQQ formats.

If you are using the OpenAI API, you can use the provided OpenAI-compatible (superset, we have things like min-p, DRY, etc) HTTP server. There is also a Python package. For Phi 3.5 MoE and other text models, there is also an interactive chat mode.

@EricLBuehler , Can you recommend some frontend app to use mistral.rs?

ThiloteE · 2024-10-25T07:38:01Z

The PR in the transformers repo to support Phi-3.5 MoE has been merged and is featured in release v4.46.0, so maybe finally llama.cpp can add this model architecture?

Oh and by the way, i just found the documentation for how to add a new model to llama.cpp, after having followed this repo for months now, lol. Here are the docs: https://github.com/ggerganov/llama.cpp/blob/master/docs/development/HOWTO-add-model.md

sinand99 · 2024-11-10T16:56:27Z

+1 for MoE support.

skylake5200 · 2024-11-29T03:01:37Z

Also eager to get Phi 3.5-Vision support. Most accurate photo and screenshot descriptions I've seen so far.

+1 for Phi 3.5-Vision support.

#9119

github-actions · 2025-02-12T01:07:20Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

YorkieDev added the enhancement New feature or request label Aug 21, 2024

wrapss mentioned this issue Aug 23, 2024

support Phi-3.5-MoE #9138

Closed

linkage001 mentioned this issue Aug 25, 2024

Request to use Phi-3.5-MoE-instruct #9168

Closed

4 tasks

RichardErkhov mentioned this issue Aug 28, 2024

requesting phi3.5-moe (guff?) RichardErkhov/quant_request#2

Open

ThiloteE mentioned this issue Aug 28, 2024

Add support for Phi-3.5-vision-instruct #9209

Draft

YorkieDev mentioned this issue Sep 2, 2024

Running vision models is a problem? lmstudio-ai/lmstudio-bug-tracker#101

Open

EricLBuehler mentioned this issue Sep 3, 2024

gguf EricLBuehler/mistral.rs#743

Closed

phymbert added a commit that referenced this issue Dec 28, 2024

model: support phimoe

6046b49

#9119

phymbert added a commit that referenced this issue Dec 28, 2024

model: support phimoe

7e177b7

#9119

phymbert mentioned this issue Dec 28, 2024

model: Add support for PhiMoE arch #11003

Merged

phymbert added the model Model specific label Dec 28, 2024

github-actions bot added the stale label Jan 28, 2025

github-actions bot closed this as completed Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add support for Phi-3.5 MoE and Vision Instruct #9119

Feature Request: Add support for Phi-3.5 MoE and Vision Instruct #9119

YorkieDev commented Aug 21, 2024

curvedinf commented Aug 22, 2024

JackCloudman commented Aug 23, 2024

simsi-andy commented Aug 24, 2024

mounta11n commented Aug 25, 2024

foldl commented Aug 28, 2024 •

edited

Loading

ayttop commented Aug 28, 2024

ayttop commented Aug 28, 2024

Dampfinchen commented Aug 29, 2024

sourceholder commented Aug 29, 2024

arnesund commented Sep 1, 2024

EricLBuehler commented Sep 2, 2024

Dampfinchen commented Sep 3, 2024

Thellton commented Sep 3, 2024

ayttop commented Sep 4, 2024

bunnyfu commented Sep 12, 2024

vaibhav1618 commented Sep 16, 2024 •

edited

Loading

sourceholder commented Sep 16, 2024

ThiloteE commented Sep 21, 2024

yueshen-intel commented Oct 3, 2024

limingchina commented Oct 20, 2024 •

edited

Loading

ThiloteE commented Oct 25, 2024 •

edited

Loading

sinand99 commented Nov 10, 2024

skylake5200 commented Nov 29, 2024

github-actions bot commented Feb 12, 2025

Feature Request: Add support for Phi-3.5 MoE and Vision Instruct #9119

Feature Request: Add support for Phi-3.5 MoE and Vision Instruct #9119

Comments

YorkieDev commented Aug 21, 2024

Prerequisites

Feature Description

Motivation

Possible Implementation

curvedinf commented Aug 22, 2024

JackCloudman commented Aug 23, 2024

simsi-andy commented Aug 24, 2024

mounta11n commented Aug 25, 2024

foldl commented Aug 28, 2024 • edited Loading

ayttop commented Aug 28, 2024

ayttop commented Aug 28, 2024

Dampfinchen commented Aug 29, 2024

sourceholder commented Aug 29, 2024

arnesund commented Sep 1, 2024

EricLBuehler commented Sep 2, 2024

Dampfinchen commented Sep 3, 2024

Thellton commented Sep 3, 2024

ayttop commented Sep 4, 2024

bunnyfu commented Sep 12, 2024

vaibhav1618 commented Sep 16, 2024 • edited Loading

sourceholder commented Sep 16, 2024

ThiloteE commented Sep 21, 2024

yueshen-intel commented Oct 3, 2024

limingchina commented Oct 20, 2024 • edited Loading

ThiloteE commented Oct 25, 2024 • edited Loading

sinand99 commented Nov 10, 2024

skylake5200 commented Nov 29, 2024

github-actions bot commented Feb 12, 2025

foldl commented Aug 28, 2024 •

edited

Loading

vaibhav1618 commented Sep 16, 2024 •

edited

Loading

limingchina commented Oct 20, 2024 •

edited

Loading

ThiloteE commented Oct 25, 2024 •

edited

Loading