-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add /audio/transcriptions Endpoint for OpenWebUI #41
Comments
OpenAI docs: https://platform.openai.com/docs/guides/speech-to-text#transcriptions note: this API uses multipart forms but does support a model parameter. @zenabius do you have a backend config that llama-swap could run for audio transcriptions? You can test it with http://server/upstream/{model}/v1/audio/transcriptions. The upstream/ path takes a model on the path and proxies everything after transparently to the upstream. |
🚀 It WORKS!🔗 API EndpointOpen WebUI API path /admin/settings/audio: ⚙️ ConfigurationBelow is the configuration for the my whisper-audio:
cmd: >
docker run --rm
--gpus '"device=2"'
--init
-p 9797:5000
-v /mnt/llm/whisper.cpp/HF:/root/.cache/
-v /mnt/llm/whisper.cpp/HF/data1:/data
local/whisper.cpp:audio-cuda
proxy: "http://127.0.0.1:9797"
ttl: 0
unlisted: false
checkEndpoint: /health
aliases:
- audio |
I would like to eventually support the Could you share a bit more about how to get it working?
I think it would be fairly easy to add the endpoint. |
Here’s a breakdown of LLM summary how to get everything working: 1. Building the Docker ContainerThe docker build -t local/whisper.cpp:audio-cuda -f Dockerfile . This will create a Docker image named 2. Running the ContainerOnce built, you can run the container using: docker run --gpus all -p 5000:5000 --rm -v /path/to/files:/root/.cache/ local/whisper.cpp:audio-cuda This will:
3. Downloading the ModelThe model is specified as huggingface-cli download openai/whisper-large-v3-turbo Alternatively, if using For 4. Testing the APIOnce the container is running, you can test the transcription API with: curl -X POST http://localhost:5000/audio/transcriptions \
-F "[email protected]" This will return a JSON response with the transcription. 5. Adding a
|
Excellent! thank you for the detailed write up! |
Please add this to the main branch, having a fully compatible open ai api set of endpoints under the same base url is very handy |
@zenabius which inference server are you using? I was looking at your docs and I can't figure which python server you're dockerizing. Edit: Actually NM :). I got whisper.cpp's server running with:
|
Using Configuration example:
Important for OpenAI API compatibility:
Testing with curl:Using samples from whisper.cpp:
Results with 3090:
Cuda Installation
|
* add support for /v1/audio/transcriptions
Fixed in #67 |
OpenWebUI requires an /audio/transcriptions endpoint to handle audio-to-text processing. This feature will allow users to transcribe audio input via the API.
Expected Behavior:
The endpoint should accept an audio file (e.g., MP3, WAV).
It should return the transcribed text in JSON format.
The text was updated successfully, but these errors were encountered: