Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add /audio/transcriptions Endpoint for OpenWebUI #41

Closed
zenabius opened this issue Jan 31, 2025 · 9 comments
Closed

Add /audio/transcriptions Endpoint for OpenWebUI #41

zenabius opened this issue Jan 31, 2025 · 9 comments
Labels
enhancement New feature or request

Comments

@zenabius
Copy link

OpenWebUI requires an /audio/transcriptions endpoint to handle audio-to-text processing. This feature will allow users to transcribe audio input via the API.

Expected Behavior:
The endpoint should accept an audio file (e.g., MP3, WAV).
It should return the transcribed text in JSON format.

@mostlygeek
Copy link
Owner

OpenAI docs: https://platform.openai.com/docs/guides/speech-to-text#transcriptions

note: this API uses multipart forms but does support a model parameter.

@zenabius do you have a backend config that llama-swap could run for audio transcriptions? You can test it with http://server/upstream/{model}/v1/audio/transcriptions. The upstream/ path takes a model on the path and proxies everything after transparently to the upstream.

@zenabius
Copy link
Author

zenabius commented Jan 31, 2025

🚀 It WORKS!

🔗 API Endpoint

Open WebUI API path /admin/settings/audio:
STT Settings
http://server/upstream/whisper-audio
STT Model
model: whisper-large-v3-turbo

⚙️ Configuration

Below is the configuration for the my /audio/transcriptions endpoint:

whisper-audio:
  cmd: >
    docker run --rm 
    --gpus '"device=2"'
    --init 
    -p 9797:5000 
    -v /mnt/llm/whisper.cpp/HF:/root/.cache/
    -v /mnt/llm/whisper.cpp/HF/data1:/data
    local/whisper.cpp:audio-cuda

  proxy: "http://127.0.0.1:9797"
  ttl: 0
  unlisted: false
  checkEndpoint: /health
  aliases:
    - audio

@mostlygeek
Copy link
Owner

I would like to eventually support the /v1/audio/transcriptions but if this works I'll leave this here for people to use it under /upstream/.

Could you share a bit more about how to get it working?

I think it would be fairly easy to add the endpoint.

@zenabius
Copy link
Author

zenabius commented Jan 31, 2025

Here’s a breakdown of LLM summary how to get everything working:

1. Building the Docker Container

The Dockerfile provided ensures that all necessary dependencies are installed. To build the container, run:

docker build -t local/whisper.cpp:audio-cuda -f Dockerfile .

This will create a Docker image named local/whisper.cpp:audio-cuda with CUDA support.

2. Running the Container

Once built, you can run the container using:

docker run --gpus all -p 5000:5000 --rm -v /path/to/files:/root/.cache/ local/whisper.cpp:audio-cuda

This will:

  • Expose the API on port 5000
  • Ensure GPU acceleration is available
  • Remove the container after stopping it

3. Downloading the Model

The model is specified as "openai/whisper-large-v3-turbo" and will be automatically downloaded from Hugging Face when the container runs. If you prefer to download it manually, you can run:

huggingface-cli download openai/whisper-large-v3-turbo

Alternatively, if using whisper.cpp, you can get models from:
https://huggingface.co/ggerganov/whisper.cpp/tree/main.

For whisper.cpp, you’d typically download and move the .bin files into a directory where your application can access them.

4. Testing the API

Once the container is running, you can test the transcription API with:

curl -X POST http://localhost:5000/audio/transcriptions \
     -F "[email protected]"

This will return a JSON response with the transcription.

5. Adding a /v1/audio/transcriptions Endpoint

If you want to support /v1/audio/transcriptions, modify backend.py:

  1. Change:

    @app.post('/audio/transcriptions', tags=[transcription_tag])

    to:

    @app.post('/v1/audio/transcriptions', tags=[transcription_tag])
  2. Restart the container! 🚀

Remove .txt from backend.py.txt and Dockerfile.txt
backend.py.txt
Dockerfile.txt
requirements.txt

@mostlygeek
Copy link
Owner

Excellent! thank you for the detailed write up!

@mostlygeek mostlygeek added the enhancement New feature or request label Feb 28, 2025
@matiashegoburu
Copy link

Please add this to the main branch, having a fully compatible open ai api set of endpoints under the same base url is very handy

@mostlygeek
Copy link
Owner

mostlygeek commented Mar 13, 2025

@zenabius which inference server are you using? I was looking at your docs and I can't figure which python server you're dockerizing.

Edit:

Actually NM :). I got whisper.cpp's server running with:

$ CUDA_VISIBLE_DEVICES=1 ./whisper-server-5bb1d58 \
  --host 0.0.0.0 --port 9233 \
  -m /mnt/nvme/models/whisper/ggml-large-v3-turbo-q8_0.bin \
  --request-path /audio/transcriptions --inference-path ""

@mostlygeek
Copy link
Owner

Using /v1/audio/transcriptions and whisper.cpp:

Configuration example:

models:
  "whisper":
    proxy: "http://127.0.0.1:9233"
    checkEndpoint: /v1/audio/transcriptions/
    cmd: >
      path/to/whisper-server
        --host 127.0.0.1 --port 9233
        -m path/to/ggml-large-v3-turbo-q8_0.bin
        --request-path /v1/audio/transcriptions --inference-path ""

Important for OpenAI API compatibility:

  1. Set flags: --request-path /v1/audio/transcriptions --inference-path ""
  2. SetcheckEndpoint: /v1/audio/transcriptions/

Testing with curl:

Using samples from whisper.cpp:

curl 10.0.1.50:8080/v1/audio/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -F file="@mm1.wav" \
    -F temperature="0.0" \
    -F temperature_inc="0.2" \
    -F response_format="json" \
    -F model="whisper"

Results with 3090:

$ time curl 10.0.1.50:8080/v1/audio/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -F file="@mm1.wav" \
    -F temperature="0.0" \
    -F temperature_inc="0.2" \
    -F response_format="json" \
    -F model="whisper" | jq -r .text

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 5150k  100   991  100 5149k   1340  6967k --:--:-- --:--:-- --:--:-- 6960k

 This is the Micro Machine Man presenting the most midget miniature motorcade of Micro Machines.
 Each one has dramatic details, terrific trim precision paint jobs, plus incredible Micro Machine pocket play sets.
 There's a police station, fire station, restaurant, service station and more.
 Perfect pocket portables to take any place.
 And there are many miniature play sets to play with and each one comes with its own special edition Micro Machine vehicle
 and fun fantastic features that miraculously move.
 Raise the boat lift at the airport marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge.
 And these play sets fit together to form a Micro Machine world.
 Micro Machine pocket play sets so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all.
 Micro Machines are Micro Machine pocket play sets sold separately from Galoov.
 The smaller they are, the better they are.


real    0m0.752s
user    0m0.010s
sys     0m0.012s

Cuda Installation

$ curl -LO https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3-turbo-q8_0.bin

$ git clone https://github.com/ggerganov/whisper.cpp.git
$ cd whisper.cpp

# fetch samples 
$ make samples

# build 
$ CUDACXX=/usr/local/cuda-12.6/bin/nvcc cmake -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=1
$ cmake --build build --config Release -j 16

# test it
$ ./build/bin/whisper-cli -m ggml-large-v3-turbo-q8_0.bin samples/jfk.wav

mostlygeek added a commit that referenced this issue Mar 13, 2025
* add support for /v1/audio/transcriptions
@mostlygeek
Copy link
Owner

Fixed in #67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants