Voice-Pro

The best AI speech recognition, translation, and multilingual dubbing solution 🚀

🎙️ An AI-powered web application for speech recognition, translation, and dubbing

South Korea Flag 한국어 ∙ United Kingdom Flag English ∙ China Flag 中文简体 ∙ Taiwan Flag 中文繁體 ∙ Japan Flag 日本語 ∙ Germany Flag Deutsch ∙ Spain Flag Español ∙ Portugal Flag Português

Voice-Pro is a state-of-the-art web app that transforms multimedia content creation. It integrates YouTube video downloading, voice separation, speech recognition, translation, and text-to-speech into a single, powerful tool for creators, researchers, and multilingual professionals.

🔊 Top-tier speech recognition: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
🎤 Zero-shot voice cloning: F5-TTS, E2-TTS, CosyVoice
📢 Multilingual text-to-speech: Edge-TTS, kokoro
🎥 YouTube processing & audio extraction: yt-dlp
🌍 Instant translation for 100+ languages: Deep-Translator

A robust alternative to ElevenLabs, Voice-Pro empowers podcasters, developers, and creators with advanced voice solutions.

⚠️ Please Note

Upgrading from v2.x to v3.x: Not possible. We recommend deleting the installer_files folder and running the latest version of start.bat.
Upgrading from v3.x to v3.x: Possible. After downloading the latest code, run update.bat.
First-time users: Please refer to the installation instructions below.
Troubleshooting: In most cases, issues can be resolved by deleting the installer_files folder and then running configure.bat followed by start.bat.

📰 News & History

version 3.0

🔥 Removed the AI Cover feature.
🚀 Added support for m-bain/whisperX.

version 2.0

🐍 Built with Python 3.10.15, Torch 2.5.1+cu124, and Gradio 5.14.0.
🆓 Free trial supports media up to 60 seconds in length.
🔥 Added the AI Cover feature.
🎤 Introduced support for CosyVoice and kokoro.
⏳ Initial run downloads CozyVoice2-0.5B (9GB), which may take over an hour depending on network speed.
🎧 Voice samples for cloning will be continuously updated.
📝 Added spaCy for natural sentence-by-sentence translation and TTS.
☁️ Subscription version includes Microsoft Azure Translator and TTS.
🏪 Subscription offers unlimited usage (no 60-second limit) during the subscription period, available via Shopify.

▶️ Demos

`Dubbing Studio` Tab: Transcription, Translation & TTS

demo-short001.mp4

Studio Tab's comprehensive media processing workflow demo: Demonstrates a one-stop media transformation process from YouTube video download to AI-based voice separation, automatic Whisper subtitles, multilingual translation, and professional dubbing using F5-TTS.

`F5-TTS-Multi` Tab: Podcast Creation

f5-tts-demo-elon-zuckerberg-1115-3.mp4

Demonstration of F5-TTS's innovative AI voice cloning technology: Showcasing advanced voice conversion technology that precisely mimics the actual voices of Mark Zuckerberg and Elon Musk to create entirely new content.

`Live Translation` Tab: Real-Time Recognition & Translation

voice-pro-demo-v1.5.7-h264-1080p-live.mp4

Demonstration of real-time multilingual translation feature: Showcasing an innovative multilingual media processing process that instantly captures BBC news content, generates subtitles in real-time, and immediately translates them into other languages.

⭐ Key Features

1. Dubbing Studio

YouTube video downloads & audio extraction
Voice separation with Demucs
Supports 100+ languages for speech recognition & translation

2. Speech Technologies

Speech-to-Text: Whisper, Faster-Whisper, Whisper-Timestamped, WhisperX
Text-to-Speech:
- Edge-TTS: 100+ languages, 400+ voices
- E2-TTS, F5-TTS, CosyVoice: Zero-shot cloning
- kokoro: Ranked #2 in HuggingFace TTS Arena

3. Real-Time Translation

Instant speech recognition
Multilingual translation on the fly
Customizable audio inputs

🤖 WebUI

`Dubbing Studio` Tab

All-in-one hub: YouTube downloads, noise removal, subtitles, translation, & TTS
Supports all ffmpeg-compatible formats
Output options: WAV, FLAC, MP3
Subtitles & recognition for 100+ languages
TTS with speed, volume, & pitch controls

`Whisper Caption` Tab

Subtitle-focused: 90+ languages
Video-integrated subtitle display
Word-level highlighting & denoise options

`Translate` Tab

Translation for 100+ languages
Supports subtitle files (ASS, SSA, SRT, etc.)
Real-time voice recognition & translation

`Speech Generation` Tab

Options: Edge-TTS, F5-TTS, CosyVoice, kokoro
Celeb voice podcasts & multilingual support

🎤✨ Reference Voice

Please request the voice you want to add on the Issues page. Issues

English

Andrew Bustamante	Andrew Huberman	Avi Loeb	Ben Shapiro	Brett Johnson	Brian Keating
Coffeezilla	Dan Carlin	David Buss	David Fravor	David Kipping	Dennis Whyte
Donald Hoffman	Donald Trump	Douglas Murray	Duncan Trussell	Elon Musk	Garry Nolan
Jack Barsky	James Sexton	Jeff Bezos	Joe Rogan	John Mearsheimer	Jordan Peterson
Kanye 'Ye' West	Mark Zuckerberg	Michael Levin	Michael Saylor	Michio Kaku	MrBeast
Nick Lane	Paul Rosolie	Ryan Graves	Sam Altman	Sam Harris	Stephen Wolfram
Tucker Carlson	Vitalik Buterin	Yuval Harari

Chinese

迪丽热巴 (Dílì Rèbā)

蔡依林 (Cài Yīlín)

吴亦凡 (Wú Yìfán)

李易峰 (Lǐ Yìfēng)

杨幂 (Yáng Mì)

赵丽颖 (Zhào Lìyǐng)

Korean

BTS 진 (Jin)

BTS RM

IU (아이유)

이병헌

이정재

유재석

Japanese

綾瀬はるか (Ayase Haruka)

💻 System Requirements

OS: Windows 10/11 (64-bit) ※ Linux/Mac unsupported
GPU: NVIDIA with CUDA 12.4 (recommended)
VRAM: 4GB+ (8GB+ preferred)
RAM: 4GB+
Storage: 20GB+ free space
Internet: Required

📀 Installation

Install Voice-Pro with ease using configure.bat and start.bat.

1. Get the Package

Clone or download the latest release (Source code (zip)) from

git clone https://github.com/abus-aikorea/voice-pro.git

2. Install & Run

🚀 configure.bat
- Sets up git, ffmpeg, and CUDA (if NVIDIA GPU)
- Run once; takes 1+ hour with internet
- Don’t close the command window
🚀 start.bat
- Launches Voice-Pro WebUI
- First run installs dependencies (1+ hour)
- Retry after deleting installer_files if issues arise

3. Update

🚀 update.bat: Refreshes Python environment (faster than reinstall)

4. Uninstall

Run uninstall.bat or delete the folder (portable install)

❓Tips & Tricks

If Browser does not run automatically

Close the Windows-Commnad window and run start.bat again.
Run the browser directly and enter the address displayed in the Windows-Command window (e.g. http://127.0.0.1:7870) in the address bar.

If a CUDA Out-Of-Memory error occurs

Check the GPU memory status in Windows Task Manager - Performance tab.
Set the Denoise level to 0 or 1. Denoise level 2 requires at least 8GB of GPU memory.
Set Compute Type to int type. The float type has better quality, but requires more GPU memory.

How to improve the quality of subtitles?

The quality of subtitles tends to improve with larger Whisper models, but this is not necessarily the case. large > medium > small > base > tiny
Among compute types, float type has good performance. The int type is a model that reduces GPU usage and increases speed through model quantization. On the other hand, performance decreases.
If you increase the denoise level, more background sounds will be removed, and only the remaining voice will be used for voice recognition. It does not always guarantee good results.

🚨 Notice

This repository offers a free trial of Voice-Pro.
The free trial version of Voice-Pro allows you to process up to 60 seconds of media.
The subscription version supports Microsoft Azure TTS and Translator. Purchase it on Shopify.

	Trial Version	☕Contributor Version	Subscription Version
Media Length Limit	60 seconds	Unlimited	Unlimited
Translation Service	Google Translate (Open Source)	Google Translate (Open Source)	Azure Translate (Microsoft)
Text-to-Speech Service	Edge TTS (Open Source)	Edge TTS (Open Source)	Azure TTS (Microsoft)

☕ Contributions

Hello, I'm David from the Voice-Pro team. Our team discovers the best AI technologies in the industry and provides them for anyone to use easily and conveniently. We are a small startup in Korea that has only been around for a year. We are working hard to help you and other creators produce great content.

Your ⭐⭐⭐⭐⭐ review would be greatly appreciated as it helps our business grow with you. Please help support our small team.

Thank you, ABUS Customer Service

If you want to participate in and help us with this project, feel free to create an Issues
If something goes wrong, please submit a Pull requests to improve this project.
Any type of contribution is welcome.
For inquiries related to purchases, business partnerships, technical tuning, investments, and other matters, please contact us by email. ([email protected])."
If you like this project, please star this repository. We would greatly appreciate it. ⭐⭐⭐
You can support Voice-Pro with a donation here:

📬 Contact

Email: [email protected]
Homepage (Korean): https://abuskorea.imweb.me
Naver (Korean): 30-day subscription
Shopify (Global): 30-day subscription

👍 YouTube

PlayList
Karaoke: Pop | K-Pop | J-Pop

🙏 Credits

Demucs: https://github.com/facebookresearch/demucs
yt-dlp: https://github.com/yt-dlp/yt-dlp
gradio: https://github.com/gradio-app/gradio
edge-TTS: https://github.com/rany2/edge-tts
F5-TTS: https://github.com/SWivid/F5-TTS.git
openai-whisper: https://github.com/openai/whisper
faster-whisper: https://github.com/SYSTRAN/faster-whisper
whisper-timestamped: https://github.com/linto-ai/whisper-timestamped
whisperX: https://github.com/m-bain/whisperX
CosyVoice: https://github.com/FunAudioLLM/CosyVoice
kokoro: https://github.com/hexgrad/kokoro
Deep-Translator: https://github.com/nidhaloff/deep-translator
spaCy: https://github.com/explosion/spaCy

©️ Copyright

by ABUS

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github		.github
app		app
cosyvoice		cosyvoice
docs		docs
model		model
src		src
third_party/Matcha-TTS		third_party/Matcha-TTS
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
configure.bat		configure.bat
one_click.cp310-win_amd64.pyd		one_click.cp310-win_amd64.pyd
requirements-voice-cpu.txt		requirements-voice-cpu.txt
requirements-voice-gpu.txt		requirements-voice-gpu.txt
start-abus.py		start-abus.py
start-voice.py		start-voice.py
start.bat		start.bat
uninstall.bat		uninstall.bat
update.bat		update.bat

License

abus-aikorea/voice-pro

Folders and files

Latest commit

History

Repository files navigation

Voice-Pro

🎙️ An AI-powered web application for speech recognition, translation, and dubbing

⚠️ Please Note

📰 News & History

▶️ Demos

Dubbing Studio Tab: Transcription, Translation & TTS

F5-TTS-Multi Tab: Podcast Creation

Live Translation Tab: Real-Time Recognition & Translation

⭐ Key Features

1. Dubbing Studio

2. Speech Technologies

3. Real-Time Translation

🤖 WebUI

Dubbing Studio Tab

Whisper Caption Tab

Translate Tab

Speech Generation Tab

🎤✨ Reference Voice

💻 System Requirements

📀 Installation

1. Get the Package

2. Install & Run

3. Update

4. Uninstall

❓Tips & Tricks

If Browser does not run automatically

If a CUDA Out-Of-Memory error occurs

How to improve the quality of subtitles?

🚨 Notice

☕ Contributions

📬 Contact

👍 YouTube

🙏 Credits

©️ Copyright

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Sponsor this project

Packages 0

Languages