3D-Speaker

This version of 3D-Speaker has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 4.1-patch1

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through

Support Platform

Chips model cost
AX650 ERes2NetV2 5.09ms
Ecapa-tdnn 7.37ms

How to use

Download all files from this repository to the device


root@ax650:~/3D-Speaker# tree
.
|-- ax650
|   `-- res2netv2.axmodel
|   `-- ecapa-tdnn.axmodel
|-- wavs
|   `-- speaker1_a_cn_16k.wav
|   `-- speaker1_b_cn_16k.wav
|   `-- speaker2_a_cn_16k.wav
|-- run_onnx_res2netv2.py
|-- run_axmodel_res2netv2.py
|-- run_onnx_ecapa_tdnn.py
|-- run_axmodel_ecapa_tdnn.py
|-- res2netv2.onnx
|-- ecapa-tdnn.onnx

Inference

Input Wavs:
|-- wavs
|   `-- speaker1_a_cn_16k.wav
|   `-- speaker1_b_cn_16k.wav
|   `-- speaker2_a_cn_16k.wav

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)

root@ax650 ~/3d_speaker # python3 run_axmodel_ecapa_tdnn.py --wavs ./speaker1_a_cn_16k.wav ./speaker2_a_cn_16k.wav
[INFO] Available providers:  ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.1-patch1-dirty 6247f37c-dirty
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.1-patch1-dirty 6247f37c-dirty
[INFO]: Computing the similarity score...
[INFO]: The similarity score between two input wavs is 0.7166

Output: [INFO]: The similarity score between two input wavs is 0.7166

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support