The best AI speech recognition, translation, and multilingual dubbing solution 🚀
한국어
∙
English
∙
中文简体
∙
中文繁體
∙
日本語
∙
Deutsch
∙
Español
∙
Português
Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.
- 🔊 Top-tier speech recognition: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
- 🎤 Zero-shot voice cloning: F5-TTS, E2-TTS, CosyVoice
- 📢 Multilingual text-to-speech: Edge-TTS, kokoro
- 🎥 YouTube processing & audio extraction: yt-dlp
- 🌍 Instant translation for 100+ languages: Deep-Translator
A robust alternative to ElevenLabs, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.
- Upgrading from v2.x to v3.x: Not possible. We recommend deleting the
installer_files
folder and running the latest version ofstart.bat
. - Upgrading from v3.x to v3.x: Possible. After downloading the latest code, run
update.bat
. - First-time users: Please refer to the installation instructions below.
- Troubleshooting: In most cases, issues can be resolved by deleting the
installer_files
folder and then runningconfigure.bat
followed bystart.bat
.
version 3.0
- 🔥 Removed the AI Cover feature.
- 🚀 Added support for m-bain/whisperX.
version 2.0
- 🐍 Built with Python 3.10.15, Torch 2.5.1+cu124, and Gradio 5.14.0.
- 🆓 Free trial supports media up to 60 seconds in length.
- 🔥 Added the AI Cover feature.
- 🎤 Introduced support for CosyVoice and kokoro.
- ⏳ Initial run downloads CozyVoice2-0.5B (9GB), which may take over an hour depending on network speed.
- 🎧 Voice samples for cloning will be continuously updated.
- 📝 Added spaCy for natural sentence-by-sentence translation and TTS.
- ☁️ Subscription version includes Microsoft Azure Translator and TTS.
- 🏪 Subscription offers unlimited usage (no 60-second limit) during the subscription period, available via Shopify.
demo-short001.mp4
Studio Tab's comprehensive media processing workflow demo: Demonstrates a one-stop media transformation process from YouTube video download to AI-based voice separation, automatic Whisper subtitles, multilingual translation, and professional dubbing using F5-TTS.
f5-tts-demo-elon-zuckerberg-1115-3.mp4
Demonstration of F5-TTS's innovative AI voice cloning technology: Showcasing advanced voice conversion technology that precisely mimics the actual voices of Mark Zuckerberg and Elon Musk to create entirely new content.
voice-pro-demo-v1.5.7-h264-1080p-live.mp4
Demonstration of real-time multilingual translation feature: Showcasing an innovative multilingual media processing process that instantly captures BBC news content, generates subtitles in real-time, and immediately translates them into other languages.
- YouTube video downloads & audio extraction
- Voice separation with Demucs
- Supports 100+ languages for speech recognition & translation
- Speech-to-Text: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
- Text-to-Speech:
- Edge-TTS: 100+ languages, 400+ voices
- E2-TTS, F5-TTS, CosyVoice: Zero-shot cloning
- kokoro: Ranked #2 in HuggingFace TTS Arena
- Instant speech recognition
- Multilingual translation on the fly
- Customizable audio inputs
- All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
- Supports all ffmpeg-compatible formats
- Output options: WAV, FLAC, MP3
- Subtitles & recognition for 100+ languages
- TTS with speed, volume, & pitch controls
- Subtitle-focused: 90+ languages
- Video-integrated subtitle display
- Word-level highlighting & denoise options
- Translation for 100+ languages
- Supports subtitle files (ASS, SSA, SRT, etc.)
- Real-time voice recognition & translation
- Options: Edge-TTS, F5-TTS, CosyVoice, kokoro
- Celeb voice podcasts & multilingual support
- Please request the voice you want to add on the Issues page. Issues
English
Chinese
![]() 迪丽热巴 (Dílì Rèbā) |
![]() 蔡依林 (Cài Yīlín) |
![]() 吴亦凡 (Wú Yìfán) |
![]() 李易峰 (Lǐ Yìfēng) |
![]() 杨幂 (Yáng Mì) |
![]() 赵丽颖 (Zhào Lìyǐng) |
- OS: Windows 10/11 (64-bit) ※ Linux/Mac unsupported
- GPU: NVIDIA with CUDA 12.4 (recommended)
- VRAM: 4GB+ (8GB+ preferred)
- RAM: 4GB+
- Storage: 20GB+ free space
- Internet: Required
Install Voice-Pro with ease using configure.bat and start.bat.
git clone https://github.com/abus-aikorea/voice-pro.git
- 🚀 configure.bat
- Sets up git, ffmpeg, and CUDA (if NVIDIA GPU)
- Run once; takes 1+ hour with internet
- Don’t close the command window
- 🚀 start.bat
- Launches Voice-Pro WebUI
- First run installs dependencies (1+ hour)
- Retry after deleting installer_files if issues arise
- 🚀 update.bat: Refreshes Python environment (faster than reinstall)
- Run uninstall.bat or delete the folder (portable install)
- Close the Windows-Commnad window and run start.bat again.
- Run the browser directly and enter the address displayed in the Windows-Command window (e.g. http://127.0.0.1:7870) in the address bar.
- Check the GPU memory status in Windows Task Manager - Performance tab.
- Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory.
- Set Compute Type to int type. The float type has better quality, but requires more GPU memory.
- The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny
- Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases.
- If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results.
- This repository offers a free trial of Voice-Pro.
- The free trial version of Voice-Pro allows you to process up to 60 seconds of media.
- The subscription version supports Microsoft Azure TTS and Translator. Purchase it on Shopify.
Trial Version | ☕Contributor Version | Subscription Version | |
---|---|---|---|
Media Length Limit | 60 seconds | Unlimited | Unlimited |
Translation Service | Google Translate (Open Source) | Google Translate (Open Source) | Azure Translate (Microsoft) |
Text-to-Speech Service | Edge TTS (Open Source) | Edge TTS (Open Source) | Azure TTS (Microsoft) |
Hello, I'm David from the Voice-Pro team. Our team discovers the best AI technologies in the industry and provides them for anyone to use easily and conveniently. We are a small startup in Korea that has only been around for a year. We are working hard to help you and other creators produce great content.
Your ⭐⭐⭐⭐⭐ review would be greatly appreciated as it helps our business grow with you. Please help support our small team.
Thank you, ABUS Customer Service
- If you want to participate in and help us with this project, feel free to create an Issues
- If something goes wrong, please submit a Pull requests to improve this project.
- Any type of contribution is welcome.
- For inquiries related to purchases, business partnerships, technical tuning, investments, and other matters, please contact us by email. ([email protected])."
- If you like this project, please star this repository. We would greatly appreciate it. ⭐⭐⭐
- You can support Voice-Pro with a donation here:
- Email: [email protected]
- Homepage (Korean): https://abuskorea.imweb.me
- Naver (Korean): 30-day subscription
- Shopify (Global): 30-day subscription
- Demucs: https://github.com/facebookresearch/demucs
- yt-dlp: https://github.com/yt-dlp/yt-dlp
- gradio: https://github.com/gradio-app/gradio
- edge-TTS: https://github.com/rany2/edge-tts
- F5-TTS: https://github.com/SWivid/F5-TTS.git
- openai-whisper: https://github.com/openai/whisper
- faster-whisper: https://github.com/SYSTRAN/faster-whisper
- whisper-timestamped: https://github.com/linto-ai/whisper-timestamped
- whisperX: https://github.com/m-bain/whisperX
- CosyVoice: https://github.com/FunAudioLLM/CosyVoice
- kokoro: https://github.com/hexgrad/kokoro
- Deep-Translator: https://github.com/nidhaloff/deep-translator
- spaCy: https://github.com/explosion/spaCy
by ABUS