SenseVoice.cpp Jetson Nano Binaries
SenseVoice.cpp is a high-performance, open-source C++ speech-to-text implementation aimed at edge devices. It leverages the GGML inference framework and supports multiple backends, including CUDA for GPU acceleration.
This repository hosts prebuilt binaries optimized for NVIDIA Jetson Nano, so you can skip the build step and start transcribing right away.
Original project: https://github.com/lovemefan/SenseVoice.cpp
β¨ Key Features
Multi-language ASR: Supports Chinese (Mandarin), Cantonese, English, Japanese, and Korean.
Low latency: Efficient inference with optional flash-attn.
Quantization: Q3, Q4, Q5, Q6, Q8 quantized models to reduce memory footprint.
Flexible backends:
- CPU (all platforms)
- CUDA (NVIDIA GPUs)
- BLAS, Metal, Vulkan (upstream)
Voice Activity Detection (VAD): Built-in silence-based VAD parameters.
Inverse Text Normalization (ITN): Optionally output punctuation and formatted text.
For full feature details (streaming mode, extra backends), see the upstream documentation.
π Deliverable Directory Structure
project-root/
βββ bin/ # Executables
β βββ sense-voice-main # Main ASR program
β βββ sense-voice-quantize # Model quantization utility
β βββ sense-voice-zcr-main # Zero-Crossing Rate detection example
βββ lib/ # Libraries
βββ libcommon.a # Common static library
βββ libggml-base.so # GGML base operations
βββ libggml-cpu.so # GGML CPU support
βββ libggml-cuda.so # GGML CUDA support
βββ libggml.so # GGML core
βββ libsense-voice-core.a# SenseVoice core
- bin/: Standalone executables for Jetson Nano.
- lib/: Static (
.a
) and shared (.so
) libraries required at runtime.
π Quick Deployment
Follow these steps to deploy and run on Ubuntu-based distributions (e.g., JetPack 4.5.1 on Jetson Nano):
1. Clone the Repo with Git LFS Support
If you havenβt installed Git LFS yet, do so and initialize:
# Install Git LFS
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
# Initialize in your repo
git lfs install
Clone the repository:
git clone https://huggingface.co/<YOUR_USERNAME>/sensevoice-jetson-nano.git
cd sensevoice-jetson-nano
git lfs pull
2. Track Large Binary Files with Git LFS
Ensure large files (shared libraries) use LFS to avoid push errors:
git lfs track "lib/*.so"
git add .gitattributes
3. Uploading New Binaries
When you update or add new .so
files in lib/
, commit and push as usual:
git add lib/*.so
git commit -m "Add updated shared libraries via LFS"
git push
4. Make Binaries Executable
chmod +x bin/*
5. Install Shared Libraries System-wide
sudo mkdir -p /usr/local/lib/sensevoice
sudo cp lib/*.so /usr/local/lib/sensevoice/
echo "/usr/local/lib/sensevoice" | sudo tee /etc/ld.so.conf.d/sensevoice.conf
sudo ldconfig
Alternatively, set LD_LIBRARY_PATH
locally:
export LD_LIBRARY_PATH="$PWD/lib:$LD_LIBRARY_PATH"
6. Model Setup
Download or convert a GGUF model (e.g., sense-voice-small-q4_k.gguf
):
# From Hugging Face
git clone https://huggingface.co/lovemefan/sense-voice-gguf.git models
7. Run Examples
Speech-to-Text (non-streaming)
bin/sense-voice-main \
-m models/sense-voice-small-q4_k.gguf \
-f input.wav \
-t 4 \
-l zh \
--use-itn \
--flash-attn
Options:
-t N
/--threads N
: Number of decode threads (default: 4)-l LANG
/--language LANG
:auto
,zh
,en
,yue
,ja
,ko
--min_speech_duration_ms
,--max_speech_duration_ms
: VAD thresholds--no-gpu
(-ng
): Disable GPU--use-itn
(-itn
): Enable inverse text normalization--flash-attn
(-fa
): Enable Flash Attention decoder
Quantization Utility
bin/sense-voice-quantize \
--input models/sense-voice-small.bin \
--output models/sense-voice-small-q4_k.gguf \
--type q4_k
Supported quant types: q3
, q4_k
, q4_0
, q5_0
, q6_k
, q8
.
Zero-Crossing Rate Demo
bin/sense-voice-zcr-main input.wav
Follow these steps to deploy and run on Ubuntu-based distributions (e.g., JetPack 4.5.1 on Jetson Nano):
1. Clone the Repo
git lfs install
git clone https://huggingface.co/<YOUR_USERNAME>/sensevoice-jetson-nano.git
cd sensevoice-jetson-nano
git pull
2. Make Binaries Executable
chmod +x bin/*
3. Install Shared Libraries System-wide
sudo mkdir -p /usr/local/lib/sensevoice
sudo cp lib/*.so /usr/local/lib/sensevoice/
echo "/usr/local/lib/sensevoice" | sudo tee /etc/ld.so.conf.d/sensevoice.conf
sudo ldconfig
Alternatively, set LD_LIBRARY_PATH
locally:
export LD_LIBRARY_PATH="$PWD/lib:$LD_LIBRARY_PATH"
4. Model Setup
Download or convert a GGUF model (e.g., sense-voice-small-q4_k.gguf
):
# From Hugging Face
git clone https://huggingface.co/lovemefan/sense-voice-gguf.git models
5. Run Examples
Speech-to-Text (non-streaming)
bin/sense-voice-main \
-m models/sense-voice-small-q4_k.gguf \
-f input.wav \
-t 4 \
-l zh \
--use-itn \
--flash-attn
Options:
-t N
/--threads N
: Number of decode threads (default: 4)-l LANG
/--language LANG
:auto
,zh
,en
,yue
,ja
,ko
--min_speech_duration_ms
,--max_speech_duration_ms
: VAD thresholds--no-gpu
(-ng
): Disable GPU--use-itn
(-itn
): Enable inverse text normalization--flash-attn
(-fa
): Enable Flash Attention decoder
Quantization Utility
bin/sense-voice-quantize \
--input models/sense-voice-small.bin \
--output models/sense-voice-small-q4_k.gguf \
--type q4_k
Supported quant types: q3
, q4_k
, q4_0
, q5_0
, q6_k
, q8
.
Zero-Crossing Rate Demo
bin/sense-voice-zcr-main input.wav
For streaming ASR or advanced examples, please refer to upstream's sense-voice-stream
in the original repo.
π Compatibility
- Hardware: NVIDIA Jetson Nano
- OS: Ubuntu 18.04 / JetPack 4.5.1
- CUDA: 10.2
- C++: C++17
π License
MIT License β see LICENSE for details.
For comprehensive build instructions, extra examples, and advanced backend support, visit the official SenseVoice.cpp documentation. Happy prototyping! ποΈπ