[feature request] rewrite the model name passed to upstream #69

matteoserva · 2025-03-14T17:32:19Z

I would like to configure the model name passed to upstream server.
From my understanding, right now the proxy sends the model name configured in the yaml file of llama-swap.

Use case:

- upstream ollama server running "gemma:27b"
- llama-swap configured with a model called "gemma_27b"

Current behavior:
the operation fails because llama-swap interprets "gemma" as profile name

Desired behavior:
gemma_27b in llama-swap becomes gemma:27b in ollama.

Function to modify:
I think a good place to insert the change is proxyOAIHandler() in proxymanager.go

Example config file:

  "gemma_27b":
    # environment variables to pass to the command
    env:
      ...

    cmd: ollama serve ...
    proxy: http://127.0.0.1:8080

    upstream_name: "gemma:27b"

The text was updated successfully, but these errors were encountered:

mostlygeek · 2025-03-14T17:56:18Z

I've been considering this feature and with the recent changes it would be fairly easy to implement.

I am curious, what is your use case for llama-swap+ollama vs llama-swap+llama-server?

matteoserva · 2025-03-14T18:10:27Z

Thanks for your quick answer.

I am curious, what is your use case for llama-swap+ollama vs llama-swap+llama-server?

Why not both?

I'm using llama-swap to not only swap models but also to swap inference engines depending on the constraints i have (RAM, VRAM, time, inputs...).

For example:

ollama supports multimodal inputs but cannot run the biggest models.
llama.cpp can run big models thanks to all the quantization options and cpu offload, no tensor parallelism.
vllm is very fast thanks to tensor parallelism and it supports multimodal inputs, limited by VRAM

I'm using llama-swap to select the best upstream backend depending on the chosen model.

mostlygeek · 2025-03-14T19:10:25Z

I use it for that too! Makes it a lot easier to swap between engines for capabilities.

In this case, the “:” used for profiles is conflicting with ollama’s naming conventions. And “upstream_name” is an override so the model name can be set to anything.

* add test for splitRequestedModel() * Add `useModelName` parameter to model configuration * add docs to README

mostlygeek · 2025-03-15T04:14:52Z

Fixed in #71 and released in v95!

Example of usage:

models:
  "qwq":
    proxy: http://127.0.0.1:11434
    cmd: my-server

    # use this new configuration parameter to override what's in the request
    useModelName: "qwen:qwq"

mostlygeek added a commit that referenced this issue Mar 15, 2025

Add support for sending a custom model name to upstream (#69) (#71)

5c97299

* add test for splitRequestedModel() * Add `useModelName` parameter to model configuration * add docs to README

mostlygeek closed this as completed Mar 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] rewrite the model name passed to upstream #69

[feature request] rewrite the model name passed to upstream #69

matteoserva commented Mar 14, 2025

mostlygeek commented Mar 14, 2025

matteoserva commented Mar 14, 2025

mostlygeek commented Mar 14, 2025

mostlygeek commented Mar 15, 2025

[feature request] rewrite the model name passed to upstream #69

[feature request] rewrite the model name passed to upstream #69

Comments

matteoserva commented Mar 14, 2025

mostlygeek commented Mar 14, 2025

matteoserva commented Mar 14, 2025

mostlygeek commented Mar 14, 2025

mostlygeek commented Mar 15, 2025