topel
/

ConvNeXt-Tiny-AT

audio embeddings

Model card Files Files and versions

topel commited on Sep 28, 2023

Commit

f4cc1d3

·

1 Parent(s): 80811e4

Update README.md

Files changed (1) hide show

README.md +6 -3

README.md CHANGED Viewed

@@ -66,13 +66,17 @@ Output:
 ## Inference: get logits and probabilities
 ```python
 sample_rate = 32000
 audio_target_length = 10 * sample_rate  # 10 s
 # AUDIO_FNAME = "f62-S-v2swA_200000_210000.wav"
 AUDIO_FNAME = "254906__tpellegrini__cavaco1.wav"
-AUDIO_FPATH = os.path.join("/path/to/audio", AUDIO_FNAME)
 waveform, sample_rate_ = torchaudio.load(AUDIO_FPATH)
 if sample_rate_ != sample_rate:
@@ -107,7 +111,6 @@ probs = output["clipwise_output"]
 # Equivalent: probs = torch.sigmoid(logits)
 print("probs size:", probs.size())
-current_dir=os.getcwd()
 lb_to_ix, ix_to_lb, id_to_ix, ix_to_id = read_audioset_label_tags(os.path.join(current_dir, "class_labels_indices.csv"))
 threshold = 0.25
@@ -137,7 +140,7 @@ Mandolin: 0.710
 Ukulele: 0.268
 ```
-Technically, it's not a Mandolin nor a Ukulele, but the Ukulele Brazilian cousin, the cavaquinho!
 ## Get audio scene embeddings

 ## Inference: get logits and probabilities
+To run the following, first download ```254906__tpellegrini__cavaco1.wav``` and ```class_labels_indices.csv``` from this repository.
 ```python
 sample_rate = 32000
 audio_target_length = 10 * sample_rate  # 10 s
 # AUDIO_FNAME = "f62-S-v2swA_200000_210000.wav"
 AUDIO_FNAME = "254906__tpellegrini__cavaco1.wav"
+current_dir=os.getcwd()
+AUDIO_FPATH = os.path.join(current_dir, AUDIO_FNAME)
 waveform, sample_rate_ = torchaudio.load(AUDIO_FPATH)
 if sample_rate_ != sample_rate:
 # Equivalent: probs = torch.sigmoid(logits)
 print("probs size:", probs.size())
 lb_to_ix, ix_to_lb, id_to_ix, ix_to_id = read_audioset_label_tags(os.path.join(current_dir, "class_labels_indices.csv"))
 threshold = 0.25
 Ukulele: 0.268
 ```
+Technically speaking, it's not a Mandolin nor a Ukulele, but a Brazilian cousin, the cavaquinho!
 ## Get audio scene embeddings