Language detection

#7
by hemant75 - opened

How to detect the language dynamically from a feed? There are two possible languages a feed can have - Hindi / English.. pls guide

hemant75 changed discussion status to closed
hemant75 changed discussion status to open

You can use this code:

from scipy.io import wavfile

def limit_languages(audio, allowed_languages: list=["en", "hi"]):
    sampling_rate, audio_data = wavfile.read(audio)

    model = WhisperModel("large-v2", device="cpu", compute_type="int8")
    language, language_probability, all_language_probs = model.detect_language(audio_data)

    score = 0
    for language_code, language_prob in all_language_probs:
        for allowed_language in allowed_languages:
            if language_code == allowed_language:
                if language_prob > score:
                    score = language_prob
                    detected_language = language_code

    return detected_language

https://github.com/SYSTRAN/faster-whisper/issues/1164#issuecomment-2495601955

Sign up or log in to comment