Add /audio/transcriptions Endpoint for OpenWebUI #41

zenabius · 2025-01-31T06:19:02Z

OpenWebUI requires an /audio/transcriptions endpoint to handle audio-to-text processing. This feature will allow users to transcribe audio input via the API.

Expected Behavior:
The endpoint should accept an audio file (e.g., MP3, WAV).
It should return the transcribed text in JSON format.

mostlygeek · 2025-01-31T06:23:27Z

OpenAI docs: https://platform.openai.com/docs/guides/speech-to-text#transcriptions

note: this API uses multipart forms but does support a model parameter.

@zenabius do you have a backend config that llama-swap could run for audio transcriptions? You can test it with http://server/upstream/{model}/v1/audio/transcriptions. The upstream/ path takes a model on the path and proxies everything after transparently to the upstream.

zenabius · 2025-01-31T20:14:11Z

🚀 It WORKS!

🔗 API Endpoint

Open WebUI API path /admin/settings/audio:
STT Settings
http://server/upstream/whisper-audio
STT Model
model: whisper-large-v3-turbo

⚙️ Configuration

Below is the configuration for the my /audio/transcriptions endpoint:

whisper-audio:
  cmd: >
    docker run --rm 
    --gpus '"device=2"'
    --init 
    -p 9797:5000 
    -v /mnt/llm/whisper.cpp/HF:/root/.cache/
    -v /mnt/llm/whisper.cpp/HF/data1:/data
    local/whisper.cpp:audio-cuda

  proxy: "http://127.0.0.1:9797"
  ttl: 0
  unlisted: false
  checkEndpoint: /health
  aliases:
    - audio

mostlygeek · 2025-01-31T20:59:52Z

I would like to eventually support the /v1/audio/transcriptions but if this works I'll leave this here for people to use it under /upstream/.

Could you share a bit more about how to get it working?

where to get the container from
where to download models from (likely here: https://huggingface.co/ggerganov/whisper.cpp/tree/main)

I think it would be fairly easy to add the endpoint.

zenabius · 2025-01-31T23:39:58Z

Here’s a breakdown of LLM summary how to get everything working:

1. Building the Docker Container

The Dockerfile provided ensures that all necessary dependencies are installed. To build the container, run:

docker build -t local/whisper.cpp:audio-cuda -f Dockerfile .

This will create a Docker image named local/whisper.cpp:audio-cuda with CUDA support.

2. Running the Container

Once built, you can run the container using:

docker run --gpus all -p 5000:5000 --rm -v /path/to/files:/root/.cache/ local/whisper.cpp:audio-cuda

This will:

Expose the API on port 5000
Ensure GPU acceleration is available
Remove the container after stopping it

3. Downloading the Model

The model is specified as "openai/whisper-large-v3-turbo" and will be automatically downloaded from Hugging Face when the container runs. If you prefer to download it manually, you can run:

huggingface-cli download openai/whisper-large-v3-turbo

Alternatively, if using whisper.cpp, you can get models from:
https://huggingface.co/ggerganov/whisper.cpp/tree/main.

For whisper.cpp, you’d typically download and move the .bin files into a directory where your application can access them.

4. Testing the API

Once the container is running, you can test the transcription API with:

curl -X POST http://localhost:5000/audio/transcriptions \
     -F "[email protected]"

This will return a JSON response with the transcription.

5. Adding a `/v1/audio/transcriptions` Endpoint

If you want to support /v1/audio/transcriptions, modify backend.py:

Change:

@app.post('/audio/transcriptions', tags=[transcription_tag])

to:

@app.post('/v1/audio/transcriptions', tags=[transcription_tag])

Restart the container! 🚀

Remove .txt from backend.py.txt and Dockerfile.txt
backend.py.txt
Dockerfile.txt
requirements.txt

mostlygeek · 2025-02-01T00:06:01Z

Excellent! thank you for the detailed write up!

matiashegoburu · 2025-03-13T02:09:50Z

Please add this to the main branch, having a fully compatible open ai api set of endpoints under the same base url is very handy

mostlygeek · 2025-03-13T18:01:13Z

@zenabius which inference server are you using? I was looking at your docs and I can't figure which python server you're dockerizing.

Edit:

Actually NM :). I got whisper.cpp's server running with:

$ CUDA_VISIBLE_DEVICES=1 ./whisper-server-5bb1d58 \
  --host 0.0.0.0 --port 9233 \
  -m /mnt/nvme/models/whisper/ggml-large-v3-turbo-q8_0.bin \
  --request-path /audio/transcriptions --inference-path ""

mostlygeek · 2025-03-13T20:37:53Z

Using /v1/audio/transcriptions and whisper.cpp:

Configuration example:

models:
  "whisper":
    proxy: "http://127.0.0.1:9233"
    checkEndpoint: /v1/audio/transcriptions/
    cmd: >
      path/to/whisper-server
        --host 127.0.0.1 --port 9233
        -m path/to/ggml-large-v3-turbo-q8_0.bin
        --request-path /v1/audio/transcriptions --inference-path ""

Important for OpenAI API compatibility:

Set flags: --request-path /v1/audio/transcriptions --inference-path ""
SetcheckEndpoint: /v1/audio/transcriptions/

Testing with curl:

Using samples from whisper.cpp:

curl 10.0.1.50:8080/v1/audio/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -F file="@mm1.wav" \
    -F temperature="0.0" \
    -F temperature_inc="0.2" \
    -F response_format="json" \
    -F model="whisper"

Results with 3090:

$ time curl 10.0.1.50:8080/v1/audio/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -F file="@mm1.wav" \
    -F temperature="0.0" \
    -F temperature_inc="0.2" \
    -F response_format="json" \
    -F model="whisper" | jq -r .text

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5150k  100   991  100 5149k   1340  6967k --:--:-- --:--:-- --:--:-- 6960k

 This is the Micro Machine Man presenting the most midget miniature motorcade of Micro Machines.
 Each one has dramatic details, terrific trim precision paint jobs, plus incredible Micro Machine pocket play sets.
 There's a police station, fire station, restaurant, service station and more.
 Perfect pocket portables to take any place.
 And there are many miniature play sets to play with and each one comes with its own special edition Micro Machine vehicle
 and fun fantastic features that miraculously move.
 Raise the boat lift at the airport marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge.
 And these play sets fit together to form a Micro Machine world.
 Micro Machine pocket play sets so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all.
 Micro Machines are Micro Machine pocket play sets sold separately from Galoov.
 The smaller they are, the better they are.


real    0m0.752s
user    0m0.010s
sys     0m0.012s

Cuda Installation

$ curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo-q8_0.bin

$ git clone https://github.com/ggerganov/whisper.cpp.git
$ cd whisper.cpp

# fetch samples 
$ make samples

# build 
$ CUDACXX=/usr/local/cuda-12.6/bin/nvcc cmake -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=1
$ cmake --build build --config Release -j 16

# test it
$ ./build/bin/whisper-cli -m ggml-large-v3-turbo-q8_0.bin samples/jfk.wav

* add support for /v1/audio/transcriptions

mostlygeek · 2025-03-13T21:02:09Z

Fixed in #67

mostlygeek added the enhancement New feature or request label Feb 28, 2025

mostlygeek added a commit that referenced this issue Mar 13, 2025

Add /v1/audio/transcriptions support (#41)

3201a68

* add support for /v1/audio/transcriptions

mostlygeek closed this as completed Mar 13, 2025

mostlygeek mentioned this issue Mar 24, 2025

/v1/audio/transcriptions returns 404, upstream works #79

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add /audio/transcriptions Endpoint for OpenWebUI #41

Add /audio/transcriptions Endpoint for OpenWebUI #41

zenabius commented Jan 31, 2025

mostlygeek commented Jan 31, 2025

zenabius commented Jan 31, 2025 •

edited

Loading

mostlygeek commented Jan 31, 2025

zenabius commented Jan 31, 2025 •

edited

Loading

mostlygeek commented Feb 1, 2025

matiashegoburu commented Mar 13, 2025

mostlygeek commented Mar 13, 2025 •

edited

Loading

mostlygeek commented Mar 13, 2025

mostlygeek commented Mar 13, 2025

Add /audio/transcriptions Endpoint for OpenWebUI #41

Add /audio/transcriptions Endpoint for OpenWebUI #41

Comments

zenabius commented Jan 31, 2025

mostlygeek commented Jan 31, 2025

zenabius commented Jan 31, 2025 • edited Loading

🚀 It WORKS!

🔗 API Endpoint

⚙️ Configuration

mostlygeek commented Jan 31, 2025

zenabius commented Jan 31, 2025 • edited Loading

1. Building the Docker Container

2. Running the Container

3. Downloading the Model

4. Testing the API

5. Adding a /v1/audio/transcriptions Endpoint

mostlygeek commented Feb 1, 2025

matiashegoburu commented Mar 13, 2025

mostlygeek commented Mar 13, 2025 • edited Loading

mostlygeek commented Mar 13, 2025

Configuration example:

Testing with curl:

Cuda Installation

mostlygeek commented Mar 13, 2025

zenabius commented Jan 31, 2025 •

edited

Loading

zenabius commented Jan 31, 2025 •

edited

Loading

5. Adding a `/v1/audio/transcriptions` Endpoint

mostlygeek commented Mar 13, 2025 •

edited

Loading