3D-Speaker
This version of 3D-Speaker has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 4.1-patch1
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through
The repo of AXera Platform, which you can get the detail of guide
Support Platform
Chips | model | cost |
---|---|---|
AX650 | ERes2NetV2 | 5.09ms |
Ecapa-tdnn | 7.37ms |
How to use
Download all files from this repository to the device
root@ax650:~/3D-Speaker# tree
.
|-- ax650
| `-- res2netv2.axmodel
| `-- ecapa-tdnn.axmodel
|-- wavs
| `-- speaker1_a_cn_16k.wav
| `-- speaker1_b_cn_16k.wav
| `-- speaker2_a_cn_16k.wav
|-- run_onnx_res2netv2.py
|-- run_axmodel_res2netv2.py
|-- run_onnx_ecapa_tdnn.py
|-- run_axmodel_ecapa_tdnn.py
|-- res2netv2.onnx
|-- ecapa-tdnn.onnx
Inference
Input Wavs:
|-- wavs
| `-- speaker1_a_cn_16k.wav
| `-- speaker1_b_cn_16k.wav
| `-- speaker2_a_cn_16k.wav
Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)
root@ax650 ~/3d_speaker # python3 run_axmodel_ecapa_tdnn.py --wavs ./speaker1_a_cn_16k.wav ./speaker2_a_cn_16k.wav
[INFO] Available providers: ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.1-patch1-dirty 6247f37c-dirty
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.1-patch1-dirty 6247f37c-dirty
[INFO]: Computing the similarity score...
[INFO]: The similarity score between two input wavs is 0.7166
Output: [INFO]: The similarity score between two input wavs is 0.7166
- Downloads last month
- 7