Feature Extraction
NeMo
CasanovaE commited on
Commit
3aed3da
Β·
verified Β·
1 Parent(s): 8e37105

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -51,7 +51,7 @@ Model | Sample Rate | Frame Rate | Bit Rate | # Codebooks | Codebook Size | Em
51
  <br> This model can be used for audio compression and can also serve as a component in the training of speech generation models.<br>
52
 
53
  ### Release Date:
54
- <br>Huggingface [08/11/2025] via https://huggingface.co/nvidia/nanocodec-22khz-1.78kbps-12.5fps<br>
55
 
56
  ## Model Architecture
57
  nemo-nano-codec is composed of a fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a [HiFi-GAN-based](https://arxiv.org/abs/2010.05646) decoder.
@@ -110,7 +110,7 @@ The model is available for use in the [NVIDIA NeMo](https://github.com/NVIDIA/Ne
110
 
111
  ### Inference
112
 
113
- For inference, you can refer to our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb), which automatically downloads the model checkpoint. Ensure that you set the model_name parameter to "nvidia/nanocodec-22khz-1.78kbps-12.5fps".
114
 
115
  Alternatively, you can use the code below, which also handles the automatic checkpoint download:
116
 
@@ -124,7 +124,7 @@ path_to_input_audio = ??? # path of the input audio
124
  path_to_output_audio = ??? # path of the reconstructed output audio
125
 
126
  # load audio codec model
127
- nemo_codec_model = AudioCodecModel.from_pretrained("nvidia/nanocodec-22khz-1.78kbps-12.5fps").eval()
128
 
129
  # get discrete tokens from audio
130
  audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
@@ -144,7 +144,7 @@ sf.write(path_to_output_audio, output_audio, nemo_codec_model.sample_rate)
144
 
145
  ```
146
 
147
- If preferred, you can manually download the [checkpoint](https://huggingface.co/nvidia/nvidia/nanocodec-22khz-1.78kbps-12.5fps/resolve/main/nanocodec-22khz-1.78kbps-12.5fps.nemo) and use the provided code to run inference on the model:
148
 
149
  ```
150
  import librosa
@@ -248,8 +248,8 @@ We evaluated our codec using multiple objective audio quality metrics across two
248
  Variant results:
249
  | Dataset | Squim MOS (↑) |PESQ (↑) |Mel Dist. (↓) | SECS (↓) | CER (↓)|
250
  |:-----------:|:----------:|:----------:|:----------:|:-----------:|:-----------:|
251
- | MLS | 4.441 | 2.760 | 0.143 | 0.862 | 2.423 |
252
- | DAPS | 4.697 | 3.030 | 0.139 | 0.831 | 0.758 |
253
 
254
 
255
 
 
51
  <br> This model can be used for audio compression and can also serve as a component in the training of speech generation models.<br>
52
 
53
  ### Release Date:
54
+ <br>Huggingface [08/11/2025] via https://huggingface.co/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps<br>
55
 
56
  ## Model Architecture
57
  nemo-nano-codec is composed of a fully convolutional generator neural network and three discriminators. The generator comprises an encoder, followed by vector quantization, and a [HiFi-GAN-based](https://arxiv.org/abs/2010.05646) decoder.
 
110
 
111
  ### Inference
112
 
113
+ For inference, you can refer to our [Audio Codec Inference Tutorial](https://github.com/NVIDIA/NeMo/blob/main/tutorials/tts/Audio_Codec_Inference.ipynb), which automatically downloads the model checkpoint. Ensure that you set the model_name parameter to "nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps".
114
 
115
  Alternatively, you can use the code below, which also handles the automatic checkpoint download:
116
 
 
124
  path_to_output_audio = ??? # path of the reconstructed output audio
125
 
126
  # load audio codec model
127
+ nemo_codec_model = AudioCodecModel.from_pretrained("nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps").eval()
128
 
129
  # get discrete tokens from audio
130
  audio, _ = librosa.load(path_to_input_audio, sr=nemo_codec_model.sample_rate)
 
144
 
145
  ```
146
 
147
+ If preferred, you can manually download the [checkpoint](https://huggingface.co/nvidia/nvidia/nemo-nano-codec-22khz-0.6kbps-12.5fps/resolve/main/nemo-nano-codec-22khz-0.6kbps-12.5fps.nemo) and use the provided code to run inference on the model:
148
 
149
  ```
150
  import librosa
 
248
  Variant results:
249
  | Dataset | Squim MOS (↑) |PESQ (↑) |Mel Dist. (↓) | SECS (↓) | CER (↓)|
250
  |:-----------:|:----------:|:----------:|:----------:|:-----------:|:-----------:|
251
+ | MLS | 4.407 | 2.012 | 0.205 | 0.701 | 7.792 |
252
+ | DAPS | 4.662 | 2.205 | 0.204 | 0.656 | 1.469|
253
 
254
 
255