3D-Speaker

This version of 3D-Speaker has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 4.1-patch1

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through

The repo of AXera Platform, which you can get the detail of guide
Pulsar2 Link, How to Convert ONNX to axmodel

Support Platform

AX650
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card

Chips	model	cost
AX650	ERes2NetV2	5.09ms
	Ecapa-tdnn	7.37ms

How to use

Download all files from this repository to the device


root@ax650:~/3D-Speaker# tree
.
|-- ax650
|   `-- res2netv2.axmodel
|   `-- ecapa-tdnn.axmodel
|-- wavs
|   `-- speaker1_a_cn_16k.wav
|   `-- speaker1_b_cn_16k.wav
|   `-- speaker2_a_cn_16k.wav
|-- run_onnx_res2netv2.py
|-- run_axmodel_res2netv2.py
|-- run_onnx_ecapa_tdnn.py
|-- run_axmodel_ecapa_tdnn.py
|-- res2netv2.onnx
|-- ecapa-tdnn.onnx

Inference

Input Wavs:
|-- wavs
|   `-- speaker1_a_cn_16k.wav
|   `-- speaker1_b_cn_16k.wav
|   `-- speaker2_a_cn_16k.wav

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)

root@ax650 ~/3d_speaker # python3 run_axmodel_ecapa_tdnn.py --wavs ./speaker1_a_cn_16k.wav ./speaker2_a_cn_16k.wav
[INFO] Available providers:  ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.1-patch1-dirty 6247f37c-dirty
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 4.1-patch1-dirty 6247f37c-dirty
[INFO]: Computing the similarity score...
[INFO]: The similarity score between two input wavs is 0.7166

Output: [INFO]: The similarity score between two input wavs is 0.7166