Prevent Cadence from defaulting to Hindi punctuation for Marathi transcripts

#12
by vishald07 - opened

We are using Cadence for punctuation restoration on Marathi transcripts generated from speech-to-text. However, since the transcripts often contain broken or phonetically inconsistent words, Cadence applies Hindi-style punctuation instead of Marathi.

Is there a way to explicitly tell Cadence which language it's processing, or configure it to prefer Marathi-specific punctuation rules? Any best practices for handling noisy ASR output with Cadence would also be appreciated.

Sample Text
हिं्मत नाही म्हणून ुम्ही सगळ्यांनी निर्वसनी झालं पाहिजे नाहीतर ुम्हाला कुणालाच मुलांना सांगण्याचा अधिकार पोहोचणारच नाही

Sign up or log in to comment