H5N1AIDS's picture
Update README.md
f6b4539 verified
## 🎬 视频字幕转录和翻译 / Transcribe and Translate Subtitles
**一个强大的、隐私优先的视频字幕转录和翻译工具**
</br>
**A powerful, privacy-first tool for transcribing and translating video subtitles**
[![Privacy First](https://img.shields.io/badge/Privacy-100%25%20Local-green.svg)](https://github.com/your-repo)
[![ONNX Runtime](https://img.shields.io/badge/Powered%20by-ONNX%20Runtime-blue.svg)](https://onnxruntime.ai/)
[![Multi-Platform](https://img.shields.io/badge/Platform-Windows%20%7C%20Linux%20%7C%20macOS-lightgrey.svg)](https://github.com/your-repo)
[Visit Github](https://github.com/DakeQQ/Transcribe-and-Translate-Subtitles)
---
## 🔒 隐私保证 / Privacy Guarantee
> **🚨 所有处理完全离线运行 / All processing runs completely offline**<br>
> - 无需互联网连接,确保最大程度的隐私和数据安全<br>
> - No internet connection required, ensuring maximum privacy and data security.
---
## 🚀 快速入门 / Quick Start
### 环境准备 / Prerequisites
```bash
# 安装 FFmpeg / Install FFmpeg
conda install ffmpeg
pip install -r requirements.txt
# 安装 Python 依赖 / Install Python dependencies
# 请根据您的硬件平台安装正确的包 / Please according to your hardware platform install the right package
# ----------------------------------------
# For CPU only
# onnxruntime>=1.22.0
# ----------------------------------------
# For Linux + AMD
# 请先按照 URL 设置 ROCm / Please follow the URL to set up the ROCm first before pip install onnxruntime-rocm
# https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/install-onnx.html
# https://onnxruntime.ai/docs/execution-providers/Vitis-AI-ExecutionProvider.html
# onnxruntime>=1.22.0
# onnxruntime-rocm>=1.22.0
# ----------------------------------------
# For Windows + (Intel or AMD)
# onnxruntime>=1.22.0
# onnxruntime-directml>=1.22.0
# ----------------------------------------
# For Intel OpenVINO CPU & GPU & NPU
# onnxruntime>=1.22.0
# onnxruntime-openvino>=1.22.0
# ----------------------------------------
# For NVIDIA-CUDA
# onnxruntime>=1.22.0
# onnxruntime-gpu>=1.22.0
# ----------------------------------------
```
### 设置
1. **下载模型**: 从 [HuggingFace](https://huggingface.co/H5N1AIDS/Transcribe_and_Translate_Subtitles) 获取所需模型
2. **下载脚本**: 将 `run.py` 放置在您的 `Transcribe_and_Translate_Subtitles` 文件夹中
3. **添加媒体**: 将您的音视频放置在 `Transcribe_and_Translate_Subtitles/Media/` 目录下
4. **运行**: 执行 `python run.py` 并打开 Web 界面
### Setup
1. **Download Models**: Get the required models from [HuggingFace](https://huggingface.co/H5N1AIDS/Transcribe_and_Translate_Subtitles)
2. **Download Script**: Place `run.py` in your `Transcribe_and_Translate_Subtitles` folder
3. **Add Media**: Place your audios/videos in `Transcribe_and_Translate_Subtitles/Media/`
4. **Run**: Execute `python run.py` and open the web interface
### 结果 / Results
在以下位置找到您处理后的字幕 / Find your processed subtitles in:
```
Transcribe_and_Translate_Subtitles/Results/Subtitles/
```
**准备好开始了吗?/ Ready to get started?** 🎉
---
## ✨ 功能特性 / Features
### 🔇 降噪模型 / Noise Reduction Models
- **[DFSMN](https://modelscope.cn/models/iic/speech_dfsmn_ans_psm_48k_causal)**
- **[GTCRN](https://github.com/Xiaobin-Rong/gtcrn)**
- **[ZipEnhancer](https://modelscope.cn/models/iic/speech_zipenhancer_ans_multiloss_16k_base)**
- **[Mel-Band-Roformer](https://github.com/KimberleyJensen/Mel-Band-Roformer-Vocal-Model)**
- **[MossFormerGAN_SE_16K](https://www.modelscope.cn/models/alibabasglab/MossFormerGAN_SE_16K)**
- **[MossFormer2_SE_48K](https://www.modelscope.cn/models/alibabasglab/MossFormer2_SE_48K)**
### 🎤 语音活动检测 (VAD) / Voice Activity Detection (VAD)
- **[Faster-Whisper-Silero](https://github.com/SYSTRAN/faster-whisper/blob/master/faster_whisper/vad.py)**
- **[Official-Silero-v6](https://github.com/snakers4/silero-vad)**
- **[HumAware](https://huggingface.co/CuriousMonkey7/HumAware-VAD)**
- **[NVIDIA-NeMo-VAD-v2.0](https://huggingface.co/nvidia/Frame_VAD_Multilingual_MarbleNet_v2.0)**
- **[TEN-VAD](https://github.com/TEN-framework/ten-vad)**
- **[Pyannote-Segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)**
- *注意:您需要接受 Pyannote 的使用条款并下载 Pyannote 的 `pytorch_model.bin` 文件。将其放置在 `VAD/pyannote_segmentation` 文件夹中*
- *Note: You need to accept Pyannote's terms of use and download the Pyannote `pytorch_model.bin` file. Place it in the `VAD/pyannote_segmentation` folder.*
### 🗣️ 语音识别 (ASR) / Speech Recognition (ASR)
#### 多语言模型 / Multilingual Models
- **[SenseVoice-Small-Multilingual](https://modelscope.cn/models/iic/SenseVoiceSmall)**
- **[Dolphin-Small-Asian 亚洲语言](https://github.com/DataoceanAI/Dolphin)**
- **[Paraformer-Large-Chinese 中文](https://modelscope.cn/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch)**
- **[Paraformer-Large-English 英语](https://modelscope.cn/models/iic/speech_paraformer_asr-en-16k-vocab4199-pytorch)**
- **[FireRedASR-AED-L Chinese 中文](https://github.com/FireRedTeam/FireRedASR)**
- **[Official-Whisper-Large-v3-Multilingual](https://huggingface.co/openai/whisper-large-v3)**
- **[Official-Whisper-Large-v3-Turbo-Multilingual](https://huggingface.co/openai/whisper-large-v3-turbo)**
- **[阿拉伯语 / Arabic](https://huggingface.co/Byne/whisper-large-v3-arabic)**
- **[巴斯克语 / Basque](https://huggingface.co/xezpeleta/whisper-large-v3-eu)**
- **[粤语 / Cantonese-Yue](https://huggingface.co/JackyHoCL/whisper-large-v3-turbo-cantonese-yue-english)**
- **[中文 / Chinese](https://huggingface.co/BELLE-2/Belle-whisper-large-v3-zh-punct)**
- **[台湾客家话 / Chinese-Hakka](https://huggingface.co/formospeech/whisper-large-v3-taiwanese-hakka)**
- **[台湾闽南语 / Chinese-Minnan](https://huggingface.co/TSukiLen/whisper-medium-chinese-tw-minnan)**
- **[台湾华语 / Chinese-Taiwan](https://huggingface.co/JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW)**
- **[CrisperWhisper-Multilingual](https://github.com/nyrahealth/CrisperWhisper)**
- **[丹麦语 / Danish](https://huggingface.co/sam8000/whisper-large-v3-turbo-danish-denmark)**
- **[印度英语 / English-Indian](https://huggingface.co/Tejveer12/Indian-Accent-English-Whisper-Finetuned)**
- **[英语 v3.5 / Engish-v3.5](https://huggingface.co/distil-whisper/distil-large-v3.5)**
- **[法语 / French](https://huggingface.co/bofenghuang/whisper-large-v3-french-distil-dec16)**
- **[瑞士德语 / German-Swiss](https://huggingface.co/Flurin17/whisper-large-v3-turbo-swiss-german)**
- **[德语 / German](https://huggingface.co/primeline/whisper-large-v3-turbo-german)**
- **[希腊语 / Greek](https://huggingface.co/sam8000/whisper-large-v3-turbo-greek-greece)**
- **[意大利语 / Italian](https://huggingface.co/bofenghuang/whisper-large-v3-distil-it-v0.2)**
- **[日语-动漫 / Japanese-Anime](https://huggingface.co/efwkjn/whisper-ja-anime-v0.3)**
- **[日语 / Japanese](https://huggingface.co/hhim8826/whisper-large-v3-turbo-ja)**
- **[韩语 / Korean](https://huggingface.co/ghost613/whisper-large-v3-turbo-korean)**
- **[马来语 / Malaysian](https://huggingface.co/mesolitica/Malaysian-whisper-large-v3-turbo-v3)**
- **[波斯语 / Persian](https://huggingface.co/MohammadGholizadeh/whisper-large-v3-persian-common-voice-17)**
- **[波兰语 / Polish](https://huggingface.co/Aspik101/distil-whisper-large-v3-pl)**
- **[葡萄牙语 / Portuguese](https://huggingface.co/freds0/distil-whisper-large-v3-ptbr)**
- **[俄语 / Russian](https://huggingface.co/dvislobokov/whisper-large-v3-turbo-russian)**
- **[塞尔维亚语 / Serbian](https://huggingface.co/Sagicc/whisper-large-v3-sr-combined)**
- **[西班牙语 / Spanish](https://huggingface.co/Berly00/whisper-large-v3-spanish)**
- **[泰语 / Thai](https://huggingface.co/nectec/Pathumma-whisper-th-large-v3)**
- **[土耳其语 / Turkish](https://huggingface.co/selimc/whisper-large-v3-turbo-turkish)**
- **[乌尔都语 / Urdu](https://huggingface.co/urdu-asr/whisper-large-v3-ur)**
- **[越南语 / Vietnamese](https://huggingface.co/suzii/vi-whisper-large-v3-turbo-v1)**
### 🤖 翻译模型 (LLM) / Translation Models (LLM)
- **[Qwen-3-4B-Instruct-2507-Abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3-4B-Instruct-2507-abliterated)**
- **[Qwen-3-8B-Abliterated](https://huggingface.co/huihui-ai/Huihui-Qwen3-8B-abliterated-v2)**
- **[Hunyuan-MT-7B-Abliterated](https://huggingface.co/huihui-ai/Huihui-Hunyuan-MT-7B-abliterated)**
- **[Seed-X-PRO-7B](https://www.modelscope.cn/models/ByteDance-Seed/Seed-X-PPO-7B)**
---
## 🖥️ 硬件支持 / Hardware Support
<table>
<tr>
<td align="center"><strong>💻 中央處理器 (CPU)</strong></td>
<td align="center"><strong>🎮 圖形處理器 (GPU)</strong></td>
<td align="center"><strong>🧠 神經網路處理單元 (NPU)</strong></td>
</tr>
<tr>
<td valign="top">
<ul>
<li>Apple Silicon</li>
<li>AMD</li>
<li>Intel</li>
</ul>
</td>
<td valign="top">
<ul>
<li>Apple CoreML</li>
<li>AMD ROCm</li>
<li>Intel OpenVINO</li>
<li>NVIDIA CUDA</li>
<li>Windows DirectML</li>
</ul>
</td>
<td valign="top">
<ul>
<li>Apple CoreML</li>
<li>AMD Ryzen-VitisAI</li>
<li>Intel OpenVINO</li>
</ul>
</td>
</tr>
</table>
---
## 📊 性能基准测试 / Performance Benchmarks
*测试条件 / Test Conditions: Ubuntu 24.04, Intel i3-12300, 7602 秒视频*
| 操作系统 (OS) | 后端 (Backend) | 降噪器 (Denoiser) | VAD | 语音识别 (ASR) | 大语言模型 (LLM) | 实时率<br>(Real-Time Factor) |
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
| Ubuntu-24.04 | CPU i3-12300 | - | Silero | SenseVoiceSmall | - | **0.08** |
| Ubuntu-24.04 | CPU i3-12300 | GTCRN | Silero | SenseVoiceSmall | Qwen2.5-7B-Instruct | **0.50** |
| Ubuntu-24.04 | CPU i3-12300 | GTCRN | FSMN | SenseVoiceSmall | - | **0.054** |
| Ubuntu-24.04 | CPU i3-12300 | ZipEnhancer | FSMN | SenseVoiceSmall | - | **0.39** |
| Ubuntu-24.04 | CPU i3-12300 | GTCRN | Silero | Whisper-Large-V3 | - | **0.20** |
| Ubuntu-24.04 | CPU i3-12300 | GTCRN | FSMN | Whisper-Large-V3-Turbo | - | **0.148** |
---
## 🛠️ 问题排查 / Troubleshooting
### 常见问题 / Common Issues
- **Silero VAD 错误 / Silero VAD Error**: 首次运行时只需重启应用程序 / Simply restart the application on first run
- **libc++ 错误 (Linux) / libc++ Error (Linux)**:
```bash
sudo apt update
sudo apt install libc++1
```
- **苹果芯片 / Apple Silicon**: 请避免安装 `onnxruntime-openvino`,因为它会导致错误 / Avoid installing `onnxruntime-openvino` as it will cause errors
---
## 📋 更新历史 / Update History
### 🆕 **2025/9/19** - 重大更新 / Major Release
-**新增 ASR / Added ASR**: 28个地区微调的 Whisper 模型 / 28 region fine-tuned Whisper models
-**新增降噪器 / Added Denoiser**: MossFormer2_SE_48K
-**新增 LLM 模型 / Added LLM Models**:
- Qwen3-4B-Instruct-2507-abliterated
- Qwen3-8B-abliterated-v2
- Hunyuan-MT-7B-abliterated
- Seed-X-PRO-7B
-**性能改进 / Performance Improvements**:
- 为类 Whisper 的 ASR 模型应用了束搜索(Beam Search)和重复惩罚(Repeat Penalty)/ Applied Beam Search & Repeat Penalty for Whisper-like ASR models
- 应用 ONNX Runtime IOBinding 实现最大加速(比常规 ort_session.run() 快 10%以上)/ Applied ONNX Runtime IOBinding for maximum speed up (10%+ faster than normal ort_session.run())
- 支持单次推理处理 20 秒的音频片段 / Support for 20 seconds audio segment per single run inference
- 改进了多线程性能 / Improved multi-threads performance
-**硬件支持扩展 / Hardware Support Expansion**:
- AMD-ROCm 执行提供程序 / Execution Provider
- AMD-MIGraphX 执行提供程序 / Execution Provider
- NVIDIA TensorRTX 执行提供程序 / Execution Provider
- *(必须先配置环境,否则无法工作 / Must config the env first or it will not work)*
-**准确性改进 / Accuracy Improvements**:
- SenseVoice
- Paraformer
- FireRedASR
- Dolphin
- ZipEnhancer
- MossFormerGAN_SE_16K
- NVIDIA-NeMo-VAD
-**速度改进 / Speed Improvements**:
- MelBandRoformer (通过转换为单声道提升速度 / speed boost by converting to mono channel)
-**移除的模型 / Removed Models**:
- FSMN-VAD
- Qwen3-4B-Official
- Qwen3-8B-Official
- Gemma3-4B-it
- Gemma3-12B-it
- InternLM3
- Phi-4-Instruct
### **2025/7/5** - 降噪增强 / Noise Reduction Enhancement
-**新增降噪模型 / Added noise reduction model**: MossFormerGAN_SE_16K
### **2025/6/11** - VAD 模型扩展 / VAD Models Expansion
-**新增 VAD 模型 / Added VAD Models**:
- HumAware-VAD
- NVIDIA-NeMo-VAD
- TEN-VAD
### **2025/6/3** - 亚洲语言支持 / Asian Language Support
-**新增 Dolphin ASR 模型以支持亚洲语言 / Added Dolphin ASR model** to support Asian languages
### **2025/5/13** - GPU 加速 / GPU Acceleration
-**新增 Float16/32 ASR 模型以支持 CUDA/DirectML GPU / Added Float16/32 ASR models** to support CUDA/DirectML GPU usage
-**GPU 性能 / GPU Performance**: 这些模型可以实现超过 99% 的 GPU 算子部署 / These models can achieve >99% GPU operator deployment
### **2025/5/9** - 主要功能发布 / Major Feature Release
-**灵活性改进 / Flexibility Improvements**:
- 新增不使用 VAD(语音活动检测)的选项 / Added option to **not use** VAD (Voice Activity Detection)
-**新增模型 / Added Models**:
- **降噪 / Noise reduction**: MelBandRoformer
- **ASR**: CrisperWhisper
- **ASR**: Whisper-Large-v3.5-Distil (英语微调 / English fine-tuned)
- **ASR**: FireRedASR-AED-L (支持中文及方言 / Chinese + dialects support)
- **三个日语动漫微调的 Whisper 模型 / Three Japanese anime fine-tuned Whisper models**
-**性能优化 / Performance Optimizations**:
- 移除 IPEX-LLM 框架以提升整体性能 / Removed IPEX-LLM framework to enhance overall performance
- 取消 LLM 量化选项,统一使用 **Q4F32** 格式 / Cancelled LLM quantization options, standardized on **Q4F32** format
- Whisper 系列推理速度提升 10% 以上 / Improved **Whisper** series inference speed by over 10%
-**准确性改进 / Accuracy Improvements**:
- 提升 **FSMN-VAD** 准确率 / Improved **FSMN-VAD** accuracy
- 提升 **Paraformer** 识别准确率 / Improved **Paraformer** recognition accuracy
- 提升 **SenseVoice** 识别准确率 / Improved **SenseVoice** recognition accuracy
-**LLM 支持 ONNX Runtime 100% GPU 算子部署 / LLM Support with ONNX Runtime 100% GPU operator deployment**:
- Qwen3-4B/8B
- InternLM3-8B
- Phi-4-mini-Instruct
- Gemma3-4B/12B-it
-**硬件支持扩展 / Hardware Support Expansion**:
- **Intel OpenVINO**
- **NVIDIA CUDA GPU**
- **Windows DirectML GPU** (支持集成显卡和独立显卡 / supports integrated and discrete GPUs)
---
## 🗺️ 路线图 / Roadmap
- [ ] **视频超分 / [Video Upscaling](https://github.com/sczhou/Upscale-A-Video)** - 提升分辨率 / Enhance resolution
- [ ] **实时播放器 / Real-time Player** - 实时转录和翻译 / Live transcription and translation