File size: 5,641 Bytes
fa18cac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
license: bsd-3-clause-clear
language:
- en
- zh
base_model:
- myshell-ai/MeloTTS-Chinese
pipeline_tag: text-to-speech
---

# melotts.axera

- MeloTTS DEMO on Axera AX650、AX630C
- 目前模型分成了 encoder、decoder 两部分,encoder 部分尚未转成 axmodel(目前通过 onnxruntime 运行)
- Github: https://github.com/ml-inory/melotts.axera 


## Support Platform

- AX650
  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
- AX630C
  - [爱芯派2](https://axera-pi-2-docs-cn.readthedocs.io/zh-cn/latest/index.html)
  - [Module-LLM](https://docs.m5stack.com/zh_CN/module/Module-LLM)
  - [LLM630 Compute Kit](https://docs.m5stack.com/zh_CN/core/LLM630%20Compute%20Kit)

|Chips|output wav | cost time | RTF |
|--|--|--|--|
|AX650| 12s | 1.5s | 0.125 |
|AX630C| 12s |   | |

## Requirements

### 添加中文输入支持

执行以下命令,正确安装中文输入法之后,重启终端登录

```
locale-gen C.utf8
update-locale LANG=C.utf8
```

### Python Requirements

#### Requirements

```
cp -rf nltk_data ~/
apt-get install libsndfile1-dev libmecab-dev
cd python
pip3 install -r requirements.txt
```

#### pyaxengine

pyaxengine 是 npu 的 python api,详细安装请参考

- https://github.com/AXERA-TECH/pyaxengine 

## How to use

```
root@ax650:/mnt/qtang/melotts.axera/python# python3 melotts.py --help
[INFO] Available providers:  ['AxEngineExecutionProvider']
usage: melotts [-h] [--sentence SENTENCE] [--wav WAV] [--encoder ENCODER] [--decoder DECODER] [--dec_len DEC_LEN] [--sample_rate SAMPLE_RATE] [--speed SPEED]
               [--language {ZH,ZH_MIX_EN,JP,EN,KR,ES,SP,FR}]

Run TTS on input sentence

options:
  -h, --help            show this help message and exit
  --sentence SENTENCE, -s SENTENCE
  --wav WAV, -w WAV
  --encoder ENCODER, -e ENCODER
  --decoder DECODER, -d DECODER
  --dec_len DEC_LEN
  --sample_rate SAMPLE_RATE, -sr SAMPLE_RATE
  --speed SPEED
  --language {ZH,ZH_MIX_EN,JP,EN,KR,ES,SP,FR}, -l {ZH,ZH_MIX_EN,JP,EN,KR,ES,SP,FR}

```

输入命令

```
python3 melotts.py -s 爱芯元智半导体股份有限公司,致力于打造世界领先的人工智能感知与边缘计算芯片。服务智慧城市、智能驾驶、机器人的海量普惠的应用 \
                   -e encoder-onnx/encoder-zh.onnx \
                   -d decoder-ax650/decoder-zh.axmodel \
```

```
root@ax650:/mnt/qtang/melotts.axera/python# python3 melotts.py \
                      --wav output.wav \
                      --encoder ../models/encoder-onnx/encoder-zh.onnx \
                      --decoder ../models/ax650/decoder-zh.axmodel \
                      --language ZH \
                      --speed 0.9

[INFO] Available providers:  ['AxEngineExecutionProvider']
sentence: 爱芯元智半导体股份有限公司,致力于打造世界领先的人工智能感知与边缘计算芯片。服务智慧城市、智能驾驶、机器人的海量普惠的应用
sample_rate: 44100
encoder: ../models/encoder-onnx/encoder-zh.onnx
decoder: ../models/ax650/decoder-zh.axmodel
language: ZH_MIX_EN
 > Text split to sentences.
爱芯元智半导体股份有限公司,
致力于打造世界领先的人工智能感知与边缘计算芯片.
服务智慧城市、智能驾驶、机器人的海量普惠的应用
 > ===========================
split_sentences_into_pieces take 3.1397342681884766ms
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.10.1s
[INFO] Model type: 0 (single core)
[INFO] Compiler version: 3.3 3251425d
load models take 7986.6042137146ms

Sentence[0]: 爱芯元智半导体股份有限公司,
Load language module take 33348.33884239197ms
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 3.227 seconds.
Prefix dict has been built successfully.
encoder run take 89.70ms
Decode slice[0]: decoder run take 108.08ms
Decode slice[1]: decoder run take 92.15ms
Decode slice[2]: decoder run take 92.17ms

Sentence[1]: 致力于打造世界领先的人工智能感知与边缘计算芯片.
Load language module take 0.042438507080078125ms
encoder run take 122.83ms
Decode slice[0]: decoder run take 92.24ms
Decode slice[1]: decoder run take 92.34ms
Decode slice[2]: decoder run take 92.16ms
Decode slice[3]: decoder run take 92.16ms
Decode slice[4]: decoder run take 92.22ms

Sentence[2]: 服务智慧城市、智能驾驶、机器人的海量普惠的应用
Load language module take 0.046253204345703125ms
encoder run take 112.59ms
Decode slice[0]: decoder run take 92.26ms
Decode slice[1]: decoder run take 92.16ms
Decode slice[2]: decoder run take 92.13ms
Decode slice[3]: decoder run take 92.13ms
Decode slice[4]: decoder run take 92.10ms
Save to output.wav
root@ax650:/mnt/qtang/melotts.axera/python# 
```

输出音频

https://github.com/user-attachments/assets/eda5c10c-7d30-46e5-a56a-f6edcf7813af


详细的运行参数:  
| 参数名称 | 说明 | 默认值 |
| --- | --- | --- |
| -s/--sentence | 输入句子 | |
| -w/--wav | 输出音频路径,wav格式 | output.wav |
| -e/--encoder | encoder模型路径 | ../models/encoder.onnx |
| -d/--decoder | decoder模型路径 | ../models/decoder.axmodel |
| -sr/--sample_rate | 采样率 | 44100 |
| --speed | 语速,越大表示越快 | 0.8 |
| --language | 从"ZH", "ZH_MIX_EN", "JP", "EN", 'KR', "SP", "FR"选择,分别对应中文、中英混合、日语、英语、韩语、西班牙语,法语 | ZH_MIX_EN