nvidia
/

nemo-nano-codec-22khz-0.6kbps-12.5fps

Feature Extraction

NeMo

Model card Files Files and versions

xet

Community

CasanovaE commited on Aug 6

Commit

3aed3da

verified ·

1 Parent(s): 8e37105

Update README.md

Browse files

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -51,7 +51,7 @@ Model | Sample Rate | Frame Rate | Bit Rate   | # Codebooks | Codebook Size | Em
 <br> This model can be used for audio compression and can also serve as a component in the training of speech generation models.<br>
 ### Release Date:
-<br>Huggingface [08/11/2025] via https://huggingface.co/nvidia/nanocodec-22khz-1.78kbps-12.5fps<br>
 ## Model Architecture
 nemo-nano-codec is composed of a fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a [HiFi-GAN-based](https://arxiv.org/abs/2010.05646) decoder.
@@ -110,7 +110,7 @@ The model is available for use in the [NVIDIA NeMo](https://github.com/NVIDIA/Ne
 ### Inference
-For inference, you can refer to our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb), which automatically downloads the model checkpoint. Ensure that you set the model_name parameter to "nvidia/nanocodec-22khz-1.78kbps-12.5fps".
 Alternatively, you can use the code below, which also handles the automatic checkpoint download:
@@ -124,7 +124,7 @@ path_to_input_audio = ??? # path of the input audio
 path_to_output_audio = ??? # path of the reconstructed output audio
 # load audio codec model
-nemo_codec_model = AudioCodecModel.from_pretrained("nvidia/nanocodec-22khz-1.78kbps-12.5fps").eval()
 # get discrete tokens from audio
 audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
@@ -144,7 +144,7 @@ sf.write(path_to_output_audio, output_audio, nemo_codec_model.sample_rate)
 ```
-If preferred, you can manually download the [checkpoint](https://huggingface.co/nvidia/nvidia/nanocodec-22khz-1.78kbps-12.5fps/resolve/main/nanocodec-22khz-1.78kbps-12.5fps.nemo) and use the provided code to run inference on the model:
 ```
 import librosa
@@ -248,8 +248,8 @@ We evaluated our codec using multiple objective audio quality metrics across two
 Variant results:
 | Dataset     | Squim MOS (↑)     |PESQ (↑)      |Mel Dist. (↓)      | SECS (↓) | CER (↓)|
 |:-----------:|:----------:|:----------:|:----------:|:-----------:|:-----------:|
-| MLS |      4.441  |   2.760    |     0.143   |   0.862     |  2.423 |
-| DAPS |      4.697    |    3.030   |     0.139    |   0.831   | 0.758 |

 <br> This model can be used for audio compression and can also serve as a component in the training of speech generation models.<br>
 ### Release Date:
+<br>Huggingface [08/11/2025] via https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps<br>
 ## Model Architecture
 nemo-nano-codec is composed of a fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a [HiFi-GAN-based](https://arxiv.org/abs/2010.05646) decoder.
 ### Inference
+For inference, you can refer to our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb), which automatically downloads the model checkpoint. Ensure that you set the model_name parameter to "nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps".
 Alternatively, you can use the code below, which also handles the automatic checkpoint download:
 path_to_output_audio = ??? # path of the reconstructed output audio
 # load audio codec model
+nemo_codec_model = AudioCodecModel.from_pretrained("nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps").eval()
 # get discrete tokens from audio
 audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
 ```
+If preferred, you can manually download the [checkpoint](https://huggingface.co/nvidia/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps/resolve/main/nemo-nano-codec-22khz-0.6kbps-12.5fps.nemo) and use the provided code to run inference on the model:
 ```
 import librosa
 Variant results:
 | Dataset     | Squim MOS (↑)     |PESQ (↑)      |Mel Dist. (↓)      | SECS (↓) | CER (↓)|
 |:-----------:|:----------:|:----------:|:----------:|:-----------:|:-----------:|
+| MLS |      4.407  |   2.012    |     0.205   |   0.701     |  7.792 |
+| DAPS |      4.662    |    2.205   |     0.204    |   0.656    | 1.469|