Update README.md
Browse files
README.md
CHANGED
@@ -51,7 +51,7 @@ Model | Sample Rate | Frame Rate | Bit Rate | # Codebooks | Codebook Size | Em
|
|
51 |
<br> This model can be used for audio compression and can also serve as a component in the training of speech generation models.<br>
|
52 |
|
53 |
### Release Date:
|
54 |
-
<br>Huggingface [08/11/2025] via https://huggingface.co/nvidia/
|
55 |
|
56 |
## Model Architecture
|
57 |
nemo-nano-codec is composed of a fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a [HiFi-GAN-based](https://arxiv.org/abs/2010.05646) decoder.
|
@@ -110,7 +110,7 @@ The model is available for use in the [NVIDIA NeMo](https://github.com/NVIDIA/Ne
|
|
110 |
|
111 |
### Inference
|
112 |
|
113 |
-
For inference, you can refer to our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb), which automatically downloads the model checkpoint. Ensure that you set the model_name parameter to "nvidia/
|
114 |
|
115 |
Alternatively, you can use the code below, which also handles the automatic checkpoint download:
|
116 |
|
@@ -124,7 +124,7 @@ path_to_input_audio = ??? # path of the input audio
|
|
124 |
path_to_output_audio = ??? # path of the reconstructed output audio
|
125 |
|
126 |
# load audio codec model
|
127 |
-
nemo_codec_model = AudioCodecModel.from_pretrained("nvidia/
|
128 |
|
129 |
# get discrete tokens from audio
|
130 |
audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
|
@@ -144,7 +144,7 @@ sf.write(path_to_output_audio, output_audio, nemo_codec_model.sample_rate)
|
|
144 |
|
145 |
```
|
146 |
|
147 |
-
If preferred, you can manually download the [checkpoint](https://huggingface.co/nvidia/nvidia/
|
148 |
|
149 |
```
|
150 |
import librosa
|
@@ -248,8 +248,8 @@ We evaluated our codec using multiple objective audio quality metrics across two
|
|
248 |
Variant results:
|
249 |
| Dataset | Squim MOS (β) |PESQ (β) |Mel Dist. (β) | SECS (β) | CER (β)|
|
250 |
|:-----------:|:----------:|:----------:|:----------:|:-----------:|:-----------:|
|
251 |
-
| MLS | 4.
|
252 |
-
| DAPS | 4.
|
253 |
|
254 |
|
255 |
|
|
|
51 |
<br> This model can be used for audio compression and can also serve as a component in the training of speech generation models.<br>
|
52 |
|
53 |
### Release Date:
|
54 |
+
<br>Huggingface [08/11/2025] via https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps<br>
|
55 |
|
56 |
## Model Architecture
|
57 |
nemo-nano-codec is composed of a fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a [HiFi-GAN-based](https://arxiv.org/abs/2010.05646) decoder.
|
|
|
110 |
|
111 |
### Inference
|
112 |
|
113 |
+
For inference, you can refer to our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb), which automatically downloads the model checkpoint. Ensure that you set the model_name parameter to "nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps".
|
114 |
|
115 |
Alternatively, you can use the code below, which also handles the automatic checkpoint download:
|
116 |
|
|
|
124 |
path_to_output_audio = ??? # path of the reconstructed output audio
|
125 |
|
126 |
# load audio codec model
|
127 |
+
nemo_codec_model = AudioCodecModel.from_pretrained("nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps").eval()
|
128 |
|
129 |
# get discrete tokens from audio
|
130 |
audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
|
|
|
144 |
|
145 |
```
|
146 |
|
147 |
+
If preferred, you can manually download the [checkpoint](https://huggingface.co/nvidia/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps/resolve/main/nemo-nano-codec-22khz-0.6kbps-12.5fps.nemo) and use the provided code to run inference on the model:
|
148 |
|
149 |
```
|
150 |
import librosa
|
|
|
248 |
Variant results:
|
249 |
| Dataset | Squim MOS (β) |PESQ (β) |Mel Dist. (β) | SECS (β) | CER (β)|
|
250 |
|:-----------:|:----------:|:----------:|:----------:|:-----------:|:-----------:|
|
251 |
+
| MLS | 4.407 | 2.012 | 0.205 | 0.701 | 7.792 |
|
252 |
+
| DAPS | 4.662 | 2.205 | 0.204 | 0.656 | 1.469|
|
253 |
|
254 |
|
255 |
|