speech?

#1
by MikuXvDev - opened

Hi! Sorry for the stupid question, but does this generate a speech?

Owner

Hi! This is audio adapter model which trained for ASR / AST tasks. but it's ongoing project, so it'll disappear and re-uploaded soon.

wow, very cool, we'll wait for instructions on how to screw such a miracle XD

Hi @junnei , Can you please provide us with some guidance or a Colab URL so we can finetune this model for other low-resource languages?

Hi Sir, @junnei !

I was looking into ASR models and I want to train one specifically for English and Telugu (Indian language). I would love to try training it the same way you did. If you could help me with some guidance or point me to a resource that could help, I’d really appreciate it.

Hi @jsbeaudry @salmankhanpm .
Here is new update for finetuning python file you can use : Link
Let me know if there is any issues!

Hi @jsbeaudry @salmankhanpm .
Here is new update for finetuning python file you can use : Link
Let me know if there is any issues!

Thank you @junnei . I will let you know, good job.

@junnei

can you confirm if the model junnei/gemma-3-4b-it-speech is a fresh base model with only the Phi-4-MM audio encoder attached to a gemma 3 model and no korean ASR/AST finetuning done on top of this stack.
as i plan to use this model for finetuning this model on multiple multilingual ASR/AST datasets. i wanted to ensure that the junnei/gemma-3-4b-it-speech is a fresh base model with no korean finetuning, as this would create interference in my multilingual finetuning.

additionally could you also share if you saw improvements with scale ? how were your evals trained on models from 1B, 4B, 12B, 27B ?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment