Transcribe and Translate Subtitles
🚨 Important Note
- Every task runs locally without internet, ensuring maximum privacy.
- Visit Github
Updates
- 2025/7/5
- Added a noise reduction model: MossFormerGAN_SE_16K
- 2025/6/11
- Added HumAware-VAD, NVIDIA-NeMo-VAD, TEN-VAD
- 2025/6/3
- Added Dolphin ASR model to support Asian languages.
- 2025/5/13
- Added Float16/32 ASR models to support CUDA/DirectML GPU usage. These models can achieve >99% GPU operator deployment.
- 2025/5/9
- Added an option to not use VAD (Voice Activity Detection), offering greater flexibility.
- Added a noise reduction model: MelBandRoformer.
- Added three Japanese anime fine-tuned Whisper models.
- Added ASR model: CrisperWhisper.
- Added English fine-tuned ASR model: Whisper-Large-v3.5-Distil.
- Added ASR model supporting Chinese (including some dialects): FireRedASR-AED-L.
- Removed the IPEX-LLM framework to enhance overall performance.
- Cancelled LLM quantization options, standardizing on the Q4F32 format.
- Improved accuracy of FSMN-VAD.
- Improved recognition accuracy of Paraformer.
- Improved recognition accuracy of SenseVoice.
- Improved inference speed of the Whisper series by over 10%.
- Supported the following large language models (LLMs) with ONNX Runtime 100% GPU operator deployment:
- Qwen3-4B/8B
- InternLM3-8B
- Phi-4-mini-Instruct
- Gemma3-4B/12B-it
- Expanded hardware support:
- Intel OpenVINO
- NVIDIA CUDA GPU
- Windows DirectML GPU (supports integrated and discrete GPUs)
✨ Features
This project is built on ONNX Runtime framework.
Deoiser Support:
VAD Support:
- FSMN
- Faster_Whisper - Silero
- Official - Silero
- HumAware
- NVIDIA-NeMo-VAD-v2.0
- TEN-VAD
- Pyannote-Segmentation-3.0
- You need to accept Pyannote's terms of use and download the Pyannote
pytorch_model.bin
file. Next, place it in theVAD/pyannote_segmentation
folder.
- You need to accept Pyannote's terms of use and download the Pyannote
ASR Support:
LLM Supports:
📋 Setup Instructions
✅ Step 1: Install Dependencies
- Run the following command in your terminal to install the latest required Python packages:
- For Apple Silicon M-series chips, avoid installing
onnxruntime-openvino
, as it will cause errors.
conda install ffmpeg
pip install -r requirements.txt
📥 Step 2: Download Necessary Models
- Download the required models from HuggingFace: Transcribe_and_Translate_Subtitles.
🖥️ Step 3: Download and Place run.py
- Download the
run.py
script from this repository. - Place it in the
Transcribe_and_Translate_Subtitles
folder.
📁 Step 4: Place Target Videos in the Media Folder
- Place the videos you want to transcribe and translate in the following directory. The application will process the videos one by one.:
Transcribe_and_Translate_Subtitles/Media
🚀 Step 5: Run the Application
- Open your preferred terminal (PyCharm, CMD, PowerShell, etc.).
- Execute the following command to start the application:
python run.py
🛠️ Step 6: Fix Error (if encountered)
- On the first run, you might encounter a Silero-VAD error. Simply restart the application, and it should be resolved.
- On the first run, you might encounter a libc++1.so error. Run the following commands in the terminal, and they should resolve the issue.
sudo apt update
sudo apt install libc++1
💻 Step 7: Device Support
- This project currently supports:
- Intel-OpenVINO-CPU-GPU-NPU
- Windows-AMD-GPU
- NVIDIA-GPU
- Apple-CPU
- AMD-CPU
🎉 Enjoy the Application!
Transcribe_and_Translate_Subtitles/Results/Subtitles
📌 To-Do List
- Beam Search for ASR models.
- Seed-X-PPO-7B with Beam Search
- Belle-Whisper-ZH
- Remove FSMN-VAD, Qwen, Gemma, Phi, InternLM. Only Gemma3-it-4B and Seed-X-PRO-7B are provided.
- Upscale the Resolution of Video
- Denoiser-MossFormer2-48K
- AMD-ROCm Support
- Real-Time Translate & Trascribe Video Player
性能 Performance
OS | Backend | Denoiser | VAD | ASR | LLM | Real-Time Factor test_video.mp4 7602 seconds |
---|---|---|---|---|---|---|
Ubuntu-24.04 | CPU i3-12300 |
- | Silero | SenseVoiceSmall | - | 0.08 |
Ubuntu-24.04 | CPU i3-12300 |
GTCRN | Silero | SenseVoiceSmall | Qwen2.5-7B-Instruct | 0.50 |
Ubuntu-24.04 | CPU i3-12300 |
GTCRN | FSMN | SenseVoiceSmall | - | 0.054 |
Ubuntu-24.04 | CPU i3-12300 |
ZipEnhancer | FSMN | SenseVoiceSmall | - | 0.39 |
Ubuntu-24.04 | CPU i3-12300 |
GTCRN | Silero | Whisper-Large-V3 | - | 0.20 |
Ubuntu-24.04 | CPU i3-12300 |
GTCRN | FSMN | Whisper-Large-V3-Turbo | - | 0.148 |
转录和翻译字幕
🚨 重要提示
- 所有任务均在本地运行,无需连接互联网,确保最大程度的隐私保护。
- 访问 Github
最近更新与功能
- 2025/7/5
- 新增 降噪 MossFormerGAN_SE_16K
- 2025/6/11
- 新增 HumAware-VAD, NVIDIA-NeMo-VAD, TEN-VAD。
- 2025/6/3
- 新增 Dolphin ASR 模型以支持亚洲语言。
- 2025/5/13
- 新增 Float16/32 ASR 模型,支持 CUDA/DirectML GPU 使用。这些模型可实现 >99% 的 GPU 算子部署率。
- 2025/5/9
- 新增 不使用 VAD(语音活动检测)的选项,提供更多灵活性。
- 新增降噪模型:MelBandRoformer。
- 新增三款日语动漫微调Whisper模型。
- 新增ASR模型:CrisperWhisper。
- 新增英语微调ASR模型:Whisper-Large-v3.5-Distil。
- 新增支持中文(包括部分方言)的ASR模型:FireRedASR-AED-L。
- 移除IPEX-LLM框架,提升整体性能。
- 取消LLM量化选项,统一采用Q4F32格式。
- 改进了FSMN-VAD的准确率。
- 改进了Paraformer的识别准确率。
- 改进了SenseVoice的识别准确率。
- 改进了Whisper系列的推理速度10%+。
- 支持以下大语言模型(LLM),实现ONNX Runtime 100% GPU算子部署:
- Qwen3-4B/8B
- InternLM3-8B
- Phi-4-mini-Instruct
- Gemma3-4B/12B-it
- 扩展硬件支持:
- Intel OpenVINO
- NVIDIA CUDA GPU
- Windows DirectML GPU(支持集成显卡和独立显卡)
✨ 功能
这个项目基于 ONNX Runtime 框架。
去噪器 (Denoiser) 支持:
语音活动检测(VAD)支持:
- FSMN
- Faster_Whisper - Silero
- 官方 - Silero
- HumAware
- NVIDIA-NeMo-VAD-v2.0
- TEN-VAD
- Pyannote-Segmentation-3.0
- 需要接受Pyannote的使用条款,並自行下载 Pyannote
pytorch_model.bin
文件,并将其放置在VAD/pyannote_segmentation
文件夹中。
- 需要接受Pyannote的使用条款,並自行下载 Pyannote
语音识别(ASR)支持:
大语言模型(LLM)支持:
📋 设置指南
✅ 第一步:安装依赖项
- 在终端中运行以下命令来安装所需的最新 Python 包:
- 对于苹果 M 系列芯片,请不要安装
onnxruntime-openvino
,否则会导致错误。
conda install ffmpeg
pip install -r requirements.txt
📥 第二步:下载必要的模型
- 从 HuggingFace 下载所需模型:Transcribe_and_Translate_Subtitles
🖥️ 第三步:下载并放置 run.py
- 从此项目的仓库下载
run.py
脚本。 - 将
run.py
放置在Transcribe_and_Translate_Subtitles
文件夹中。
📁 第四步:将目标视频放入 Media 文件夹
- 将你想要转录和翻译的视频放置在以下目录,应用程序将逐个处理这些视频:
Transcribe_and_Translate_Subtitles/Media
🚀 第五步:运行应用程序
- 打开你喜欢的终端工具(PyCharm、CMD、PowerShell 等)。
- 运行以下命令来启动应用程序:
python run.py
🛠️ 第六步:修复错误(如有)
- 首次运行时,你可能会遇到 Silero-VAD 错误。只需重启应用程序即可解决该问题。
- 首次运行时,你可能会遇到 libc++1.so 错误。在终端中运行以下命令,应该可以解决问题。
sudo apt update
sudo apt install libc++1
💻 第七步:支持设备
- 此项目目前支持:
- Intel-OpenVINO-CPU-GPU-NPU
- Windows-AMD-GPU
- NVIDIA-GPU
- Apple-CPU
- AMD-CPU
🎉 尽情享受应用程序吧!
Transcribe_and_Translate_Subtitles/Results/Subtitles
📌 待办事项
- Beam Search for ASR models.
- Seed-X-PPO-7B with Beam Search
- Belle-Whisper-ZH
- Remove FSMN-VAD, Qwen, Gemma, Phi, InternLM. Only Gemma3-it-4B and Seed-X-PRO-7B are provided.
- Upscale the Resolution of Video
- Denoiser-MossFormer2-48K
- 支持 AMD-ROCm
- 实现实时视频转录和翻译播放器
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support