Update README.md
Browse files
README.md
CHANGED
@@ -19,10 +19,10 @@ We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (
|
|
19 |
|
20 |
[](https://github.com/HeCheng0625/Diffusion-Speech-Tokenizer)
|
21 |
[](https://hecheng0625.github.io/assets/pdf/Arxiv_TaDiCodec.pdf)
|
22 |
-
[](https://www.python.org/)
|
24 |
[](https://pytorch.org/)
|
25 |
-
[
|
61 |
|
62 |
# Load AR TTS model, it will automatically download the model from Hugging Face for the first time
|
63 |
-
tts_model = TTSInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-AR-Qwen2.5-
|
64 |
|
65 |
# Load MGM TTS model, it will automatically download the model from Hugging Face for the first time
|
66 |
tts_model = MGMInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-MGM")
|
@@ -120,13 +120,14 @@ sf.write("./use_examples/test_audio/trump_rec.wav", rec_audio, 24000)
|
|
120 |
import torch
|
121 |
import soundfile as sf
|
122 |
from models.tts.llm_tts.inference_llm_tts import TTSInferencePipeline
|
|
|
123 |
|
124 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
125 |
|
126 |
# Create AR TTS pipeline
|
127 |
pipeline = TTSInferencePipeline.from_pretrained(
|
128 |
tadicodec_path="./ckpt/TaDiCodec",
|
129 |
-
llm_path="./ckpt/TaDiCodec-TTS-AR-Qwen2.5-
|
130 |
device=device,
|
131 |
)
|
132 |
|
@@ -178,8 +179,6 @@ MaskGCT:
|
|
178 |
|
179 |
# π Acknowledgments
|
180 |
|
181 |
-
- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
|
182 |
-
|
183 |
- **MGM-based TTS** is built upon [MaskGCT](https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct).
|
184 |
|
185 |
- **Vocos vocoder** is built upon [Vocos](https://github.com/gemelo-ai/vocos).
|
@@ -188,3 +187,4 @@ MaskGCT:
|
|
188 |
|
189 |
- **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
|
190 |
|
|
|
|
19 |
|
20 |
[](https://github.com/HeCheng0625/Diffusion-Speech-Tokenizer)
|
21 |
[](https://hecheng0625.github.io/assets/pdf/Arxiv_TaDiCodec.pdf)
|
22 |
+
[](https://tadicodec.github.io/)
|
23 |
[](https://www.python.org/)
|
24 |
[](https://pytorch.org/)
|
25 |
+
[](https://huggingface.co/amphion/TaDiCodec)
|
26 |
|
27 |
# π€ Pre-trained Models
|
28 |
|
|
|
34 |
|
35 |
| Model | π€ Hugging Face | π· Status |
|
36 |
|:-----:|:---------------:|:------:|
|
37 |
+
| **π TaDiCodec** | [](https://huggingface.co/amphion/TaDiCodec) | β
|
|
38 |
+
| **π TaDiCodec-old** | [](https://huggingface.co/amphion/TaDiCodec-old) | π§ |
|
39 |
|
40 |
*Note: TaDiCodec-old is the old version of TaDiCodec, the TaDiCodec-TTS-AR-Phi-3.5-4B is based on TaDiCodec-old.*
|
41 |
|
|
|
43 |
|
44 |
| Model | Type | LLM | π€ Hugging Face | π· Status |
|
45 |
|:-----:|:----:|:---:|:---------------:|:-------------:|
|
46 |
+
| **π€ TaDiCodec-TTS-AR-Qwen2.5-0.5B** | AR | Qwen2.5-0.5B-Instruct | [](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Qwen2.5-0.5B) | β
|
|
47 |
+
| **π€ TaDiCodec-TTS-AR-Qwen2.5-3B** | AR | Qwen2.5-3B-Instruct | [](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Qwen2.5-3B) | β
|
|
48 |
+
| **π€ TaDiCodec-TTS-AR-Phi-3.5-4B** | AR | Phi-3.5-mini-instruct | [](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Phi-3.5-4B) | π§ |
|
49 |
+
| **π TaDiCodec-TTS-MGM** | MGM | - | [](https://huggingface.co/amphion/TaDiCodec-TTS-MGM) | β
|
|
50 |
|
51 |
## π§ Quick Model Usage
|
52 |
|
|
|
60 |
tokenizer = TaDiCodecPipline.from_pretrained("amphion/TaDiCodec")
|
61 |
|
62 |
# Load AR TTS model, it will automatically download the model from Hugging Face for the first time
|
63 |
+
tts_model = TTSInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-AR-Qwen2.5-3B")
|
64 |
|
65 |
# Load MGM TTS model, it will automatically download the model from Hugging Face for the first time
|
66 |
tts_model = MGMInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-MGM")
|
|
|
120 |
import torch
|
121 |
import soundfile as sf
|
122 |
from models.tts.llm_tts.inference_llm_tts import TTSInferencePipeline
|
123 |
+
# from models.tts.llm_tts.inference_mgm_tts import MGMInferencePipeline
|
124 |
|
125 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
126 |
|
127 |
# Create AR TTS pipeline
|
128 |
pipeline = TTSInferencePipeline.from_pretrained(
|
129 |
tadicodec_path="./ckpt/TaDiCodec",
|
130 |
+
llm_path="./ckpt/TaDiCodec-TTS-AR-Qwen2.5-3B",
|
131 |
device=device,
|
132 |
)
|
133 |
|
|
|
179 |
|
180 |
# π Acknowledgments
|
181 |
|
|
|
|
|
182 |
- **MGM-based TTS** is built upon [MaskGCT](https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct).
|
183 |
|
184 |
- **Vocos vocoder** is built upon [Vocos](https://github.com/gemelo-ai/vocos).
|
|
|
187 |
|
188 |
- **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
|
189 |
|
190 |
+
- **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
|