3loi/SER-Odyssey-Baseline-WavLM-Categorical

Thank you!

by Marcophono - opened Jun 16, 2024

Jun 16, 2024

I am researching since days to find a better (a much better) SER model than the standard one from Speechbrain. Happy that I found this one! :-)

Marcophono

Jun 16, 2024

By the way: Moving to cuda isn't possible, I think?

3loi

Owner Jun 17, 2024

•

edited Jun 17, 2024

Thank you for the positive feedback!

If you have a GPU with enough VRAM and cuda/pytorch installed, you should be able to run on GPU with a simple: "model = model.cuda()", after loading the model.

Also, be sure to move your data to GPU as well for example:

# load model
model = model.cuda()
# load data

with torch.no_grad():
        wavs = wavs.cuda(non_blocking=True).float()
        mask = mask.cuda(non_blocking=True).float()
        pred = model(wavs, mask)

Marcophono

Jun 17, 2024

Thanks again, 3loi! In the meantime I solved it with

mask = torch.ones(1, len(norm_wav)).to(device)
wavs = torch.tensor(norm_wav).unsqueeze(0).to(device)

what seems to work.

shaamil101

Jun 18, 2024

Do you think I could use runpod.io to run this model? Curious to hear your thoughts.

3loi

Owner Jun 18, 2024

@shaamil101

I am unfamiliar with runpod.io. But, the model should be able to run on any cloud computing service, assuming its setup correctly. So, I don't see any reason why it shouldn't.

It seems they have 24GB up to 192GB VRAM GPU cloud computing service, which is more than enough. I am able to run this model on a RTX 3090 with 24GB VRAM just fine, with single file inference.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment