File size: 5,641 Bytes
fa18cac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
---
license: bsd-3-clause-clear
language:
- en
- zh
base_model:
- myshell-ai/MeloTTS-Chinese
pipeline_tag: text-to-speech
---
# melotts.axera
- MeloTTS DEMO on Axera AX650、AX630C
- 目前模型分成了 encoder、decoder 两部分,encoder 部分尚未转成 axmodel(目前通过 onnxruntime 运行)
- Github: https://github.com/ml-inory/melotts.axera
## Support Platform
- AX650
- [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
- [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- AX630C
- [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
- [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
- [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)
|Chips|output wav | cost time | RTF |
|--|--|--|--|
|AX650| 12s | 1.5s | 0.125 |
|AX630C| 12s | | |
## Requirements
### 添加中文输入支持
执行以下命令,正确安装中文输入法之后,重启终端登录
```
locale-gen C.utf8
update-locale LANG=C.utf8
```
### Python Requirements
#### Requirements
```
cp -rf nltk_data ~/
apt-get install libsndfile1-dev libmecab-dev
cd python
pip3 install -r requirements.txt
```
#### pyaxengine
pyaxengine 是 npu 的 python api,详细安装请参考
- https://github.com/AXERA-TECH/pyaxengine
## How to use
```
root@ax650:/mnt/qtang/melotts.axera/python# python3 melotts.py --help
[INFO] Available providers: ['AxEngineExecutionProvider']
usage: melotts [-h] [--sentence SENTENCE] [--wav WAV] [--encoder ENCODER] [--decoder DECODER] [--dec_len DEC_LEN] [--sample_rate SAMPLE_RATE] [--speed SPEED]
[--language {ZH,ZH_MIX_EN,JP,EN,KR,ES,SP,FR}]
Run TTS on input sentence
options:
-h, --help show this help message and exit
--sentence SENTENCE, -s SENTENCE
--wav WAV, -w WAV
--encoder ENCODER, -e ENCODER
--decoder DECODER, -d DECODER
--dec_len DEC_LEN
--sample_rate SAMPLE_RATE, -sr SAMPLE_RATE
--speed SPEED
--language {ZH,ZH_MIX_EN,JP,EN,KR,ES,SP,FR}, -l {ZH,ZH_MIX_EN,JP,EN,KR,ES,SP,FR}
```
输入命令
```
python3 melotts.py -s 爱芯元智半导体股份有限公司,致力于打造世界领先的人工智能感知与边缘计算芯片。服务智慧城市、智能驾驶、机器人的海量普惠的应用 \
-e encoder-onnx/encoder-zh.onnx \
-d decoder-ax650/decoder-zh.axmodel \
```
```
root@ax650:/mnt/qtang/melotts.axera/python# python3 melotts.py \
--wav output.wav \
--encoder ../models/encoder-onnx/encoder-zh.onnx \
--decoder ../models/ax650/decoder-zh.axmodel \
--language ZH \
--speed 0.9
[INFO] Available providers: ['AxEngineExecutionProvider']
sentence: 爱芯元智半导体股份有限公司,致力于打造世界领先的人工智能感知与边缘计算芯片。服务智慧城市、智能驾驶、机器人的海量普惠的应用
sample_rate: 44100
encoder: ../models/encoder-onnx/encoder-zh.onnx
decoder: ../models/ax650/decoder-zh.axmodel
language: ZH_MIX_EN
> Text split to sentences.
爱芯元智半导体股份有限公司,
致力于打造世界领先的人工智能感知与边缘计算芯片.
服务智慧城市、智能驾驶、机器人的海量普惠的应用
> ===========================
split_sentences_into_pieces take 3.1397342681884766ms
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.10.1s
[INFO] Model type: 0 (single core)
[INFO] Compiler version: 3.3 3251425d
load models take 7986.6042137146ms
Sentence[0]: 爱芯元智半导体股份有限公司,
Load language module take 33348.33884239197ms
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 3.227 seconds.
Prefix dict has been built successfully.
encoder run take 89.70ms
Decode slice[0]: decoder run take 108.08ms
Decode slice[1]: decoder run take 92.15ms
Decode slice[2]: decoder run take 92.17ms
Sentence[1]: 致力于打造世界领先的人工智能感知与边缘计算芯片.
Load language module take 0.042438507080078125ms
encoder run take 122.83ms
Decode slice[0]: decoder run take 92.24ms
Decode slice[1]: decoder run take 92.34ms
Decode slice[2]: decoder run take 92.16ms
Decode slice[3]: decoder run take 92.16ms
Decode slice[4]: decoder run take 92.22ms
Sentence[2]: 服务智慧城市、智能驾驶、机器人的海量普惠的应用
Load language module take 0.046253204345703125ms
encoder run take 112.59ms
Decode slice[0]: decoder run take 92.26ms
Decode slice[1]: decoder run take 92.16ms
Decode slice[2]: decoder run take 92.13ms
Decode slice[3]: decoder run take 92.13ms
Decode slice[4]: decoder run take 92.10ms
Save to output.wav
root@ax650:/mnt/qtang/melotts.axera/python#
```
输出音频
https://github.com/user-attachments/assets/eda5c10c-7d30-46e5-a56a-f6edcf7813af
详细的运行参数:
| 参数名称 | 说明 | 默认值 |
| --- | --- | --- |
| -s/--sentence | 输入句子 | |
| -w/--wav | 输出音频路径,wav格式 | output.wav |
| -e/--encoder | encoder模型路径 | ../models/encoder.onnx |
| -d/--decoder | decoder模型路径 | ../models/decoder.axmodel |
| -sr/--sample_rate | 采样率 | 44100 |
| --speed | 语速,越大表示越快 | 0.8 |
| --language | 从"ZH", "ZH_MIX_EN", "JP", "EN", 'KR', "SP", "FR"选择,分别对应中文、中英混合、日语、英语、韩语、西班牙语,法语 | ZH_MIX_EN
|