Hecheng0625 commited on
Commit
73dc3b7
Β·
verified Β·
1 Parent(s): 9144d55

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -12
README.md CHANGED
@@ -19,10 +19,10 @@ We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (
19
 
20
  [![GitHub Stars](https://img.shields.io/github/stars/HeCheng0625/Diffusion-Speech-Tokenizer?style=social)](https://github.com/HeCheng0625/Diffusion-Speech-Tokenizer)
21
  [![arXiv](https://img.shields.io/badge/arXiv-2024.xxxxx-b31b1b.svg)](https://hecheng0625.github.io/assets/pdf/Arxiv_TaDiCodec.pdf)
22
- [![Demo](https://img.shields.io/badge/🎬%20Demo-tadicodec-green?style=flat-square)](https://tadicodec.github.io/)
23
  [![Python](https://img.shields.io/badge/Python-3.8+-3776ab.svg)](https://www.python.org/)
24
  [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-ee4c2c.svg)](https://pytorch.org/)
25
- [![Hugging Face](https://img.shields.io/badge/πŸ€—%20HuggingFace-tadicodec-yellow?style=flat-square)](https://huggingface.co/amphion/TaDiCodec)
26
 
27
  # πŸ€— Pre-trained Models
28
 
@@ -34,8 +34,8 @@ We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (
34
 
35
  | Model | πŸ€— Hugging Face | πŸ‘· Status |
36
  |:-----:|:---------------:|:------:|
37
- | **πŸš€ TaDiCodec** | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec-yellow?style=flat-square)](https://huggingface.co/amphion/TaDiCodec) | βœ… |
38
- | **πŸš€ TaDiCodec-old** | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--old-yellow?style=flat-square)](https://huggingface.co/amphion/TaDiCodec-old) | 🚧 |
39
 
40
  *Note: TaDiCodec-old is the old version of TaDiCodec, the TaDiCodec-TTS-AR-Phi-3.5-4B is based on TaDiCodec-old.*
41
 
@@ -43,10 +43,10 @@ We introduce the **T**ext-**a**ware **Di**ffusion Transformer Speech **Codec** (
43
 
44
  | Model | Type | LLM | πŸ€— Hugging Face | πŸ‘· Status |
45
  |:-----:|:----:|:---:|:---------------:|:-------------:|
46
- | **πŸ€– TaDiCodec-TTS-AR-Qwen2.5-0.5B** | AR | Qwen2.5-0.5B-Instruct | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--AR--0.5B-yellow?style=flat-square)](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Qwen2.5-0.5B) | βœ… |
47
- | **πŸ€– TaDiCodec-TTS-AR-Qwen2.5-3B** | AR | Qwen2.5-3B-Instruct | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--AR--3B-yellow?style=flat-square)](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Qwen2.5-3B) | βœ… |
48
- | **πŸ€– TaDiCodec-TTS-AR-Phi-3.5-4B** | AR | Phi-3.5-mini-instruct | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--AR--4B-yellow?style=flat-square)](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Phi-3.5-4B) | 🚧 |
49
- | **🌊 TaDiCodec-TTS-MGM-0.6B** | MGM | - | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--MGM--0.6B-yellow?style=flat-square)](https://huggingface.co/amphion/TaDiCodec-TTS-MGM-0.6B) | βœ… |
50
 
51
  ## πŸ”§ Quick Model Usage
52
 
@@ -60,7 +60,7 @@ from models.tts.llm_tts.inference_mgm_tts import MGMInferencePipeline
60
  tokenizer = TaDiCodecPipline.from_pretrained("amphion/TaDiCodec")
61
 
62
  # Load AR TTS model, it will automatically download the model from Hugging Face for the first time
63
- tts_model = TTSInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-AR-Qwen2.5-0.5B")
64
 
65
  # Load MGM TTS model, it will automatically download the model from Hugging Face for the first time
66
  tts_model = MGMInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-MGM")
@@ -120,13 +120,14 @@ sf.write("./use_examples/test_audio/trump_rec.wav", rec_audio, 24000)
120
  import torch
121
  import soundfile as sf
122
  from models.tts.llm_tts.inference_llm_tts import TTSInferencePipeline
 
123
 
124
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
125
 
126
  # Create AR TTS pipeline
127
  pipeline = TTSInferencePipeline.from_pretrained(
128
  tadicodec_path="./ckpt/TaDiCodec",
129
- llm_path="./ckpt/TaDiCodec-TTS-AR-Qwen2.5-0.5B",
130
  device=device,
131
  )
132
 
@@ -178,8 +179,6 @@ MaskGCT:
178
 
179
  # πŸ™ Acknowledgments
180
 
181
- - **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).
182
-
183
  - **MGM-based TTS** is built upon [MaskGCT](https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct).
184
 
185
  - **Vocos vocoder** is built upon [Vocos](https://github.com/gemelo-ai/vocos).
@@ -188,3 +187,4 @@ MaskGCT:
188
 
189
  - **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
190
 
 
 
19
 
20
  [![GitHub Stars](https://img.shields.io/github/stars/HeCheng0625/Diffusion-Speech-Tokenizer?style=social)](https://github.com/HeCheng0625/Diffusion-Speech-Tokenizer)
21
  [![arXiv](https://img.shields.io/badge/arXiv-2024.xxxxx-b31b1b.svg)](https://hecheng0625.github.io/assets/pdf/Arxiv_TaDiCodec.pdf)
22
+ [![Demo](https://img.shields.io/badge/🎬%20Demo-tadicodec-green)](https://tadicodec.github.io/)
23
  [![Python](https://img.shields.io/badge/Python-3.8+-3776ab.svg)](https://www.python.org/)
24
  [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-ee4c2c.svg)](https://pytorch.org/)
25
+ [![Hugging Face](https://img.shields.io/badge/πŸ€—%20HuggingFace-tadicodec-yellow)](https://huggingface.co/amphion/TaDiCodec)
26
 
27
  # πŸ€— Pre-trained Models
28
 
 
34
 
35
  | Model | πŸ€— Hugging Face | πŸ‘· Status |
36
  |:-----:|:---------------:|:------:|
37
+ | **πŸš€ TaDiCodec** | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec-yellow)](https://huggingface.co/amphion/TaDiCodec) | βœ… |
38
+ | **πŸš€ TaDiCodec-old** | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--old-yellow)](https://huggingface.co/amphion/TaDiCodec-old) | 🚧 |
39
 
40
  *Note: TaDiCodec-old is the old version of TaDiCodec, the TaDiCodec-TTS-AR-Phi-3.5-4B is based on TaDiCodec-old.*
41
 
 
43
 
44
  | Model | Type | LLM | πŸ€— Hugging Face | πŸ‘· Status |
45
  |:-----:|:----:|:---:|:---------------:|:-------------:|
46
+ | **πŸ€– TaDiCodec-TTS-AR-Qwen2.5-0.5B** | AR | Qwen2.5-0.5B-Instruct | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--AR--0.5B-yellow)](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Qwen2.5-0.5B) | βœ… |
47
+ | **πŸ€– TaDiCodec-TTS-AR-Qwen2.5-3B** | AR | Qwen2.5-3B-Instruct | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--AR--3B-yellow)](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Qwen2.5-3B) | βœ… |
48
+ | **πŸ€– TaDiCodec-TTS-AR-Phi-3.5-4B** | AR | Phi-3.5-mini-instruct | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--AR--4B-yellow)](https://huggingface.co/amphion/TaDiCodec-TTS-AR-Phi-3.5-4B) | 🚧 |
49
+ | **🌊 TaDiCodec-TTS-MGM** | MGM | - | [![HF](https://img.shields.io/badge/πŸ€—%20HF-TaDiCodec--MGM-yellow)](https://huggingface.co/amphion/TaDiCodec-TTS-MGM) | βœ… |
50
 
51
  ## πŸ”§ Quick Model Usage
52
 
 
60
  tokenizer = TaDiCodecPipline.from_pretrained("amphion/TaDiCodec")
61
 
62
  # Load AR TTS model, it will automatically download the model from Hugging Face for the first time
63
+ tts_model = TTSInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-AR-Qwen2.5-3B")
64
 
65
  # Load MGM TTS model, it will automatically download the model from Hugging Face for the first time
66
  tts_model = MGMInferencePipeline.from_pretrained("amphion/TaDiCodec-TTS-MGM")
 
120
  import torch
121
  import soundfile as sf
122
  from models.tts.llm_tts.inference_llm_tts import TTSInferencePipeline
123
+ # from models.tts.llm_tts.inference_mgm_tts import MGMInferencePipeline
124
 
125
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
126
 
127
  # Create AR TTS pipeline
128
  pipeline = TTSInferencePipeline.from_pretrained(
129
  tadicodec_path="./ckpt/TaDiCodec",
130
+ llm_path="./ckpt/TaDiCodec-TTS-AR-Qwen2.5-3B",
131
  device=device,
132
  )
133
 
 
179
 
180
  # πŸ™ Acknowledgments
181
 
 
 
182
  - **MGM-based TTS** is built upon [MaskGCT](https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct).
183
 
184
  - **Vocos vocoder** is built upon [Vocos](https://github.com/gemelo-ai/vocos).
 
187
 
188
  - **(Binary Spherical Quantization) BSQ** is built upon [vector-quantize-pytorch](https://github.com/lucidrains/vector-quantize-pytorch) and [bsq-vit](https://github.com/zhaoyue-zephyrus/bsq-vit).
189
 
190
+ - **Training codebase** is built upon [Amphion](https://github.com/open-mmlab/Amphion) and [accelerate](https://github.com/huggingface/accelerate).