--- language: - mr - te - ml tags: - audio-classification - speech-recognition - indian-languages - tensorflow license: apache-2.0 --- # Language Classifier - Indian Languages (Marathi, Telugu, Malayalam) This model classifies audio samples into three Indian languages: Marathi, Telugu, and Malayalam. ## Model Description ### Model Architecture - 1D Convolutional Neural Network (CNN) with the following key components: - 3 Convolutional blocks with increasing filters (64, 128, 256) - Batch Normalization and ReLU activation after each convolution - MaxPooling and Dropout for regularization - Dense layers with 256 units followed by a Softmax output layer - Input: Audio features (MFCC + Delta features) - Output: Language classification probabilities ### Training Data The model was trained on: - Total samples per language: 1000 - Training: 700 samples - Validation: 150 samples - Test: 150 samples ### Features - MFCC (Mel-frequency cepstral coefficients) with delta features - Number of MFCC coefficients: 13 - Maximum padding length: 174 - Feature type: MFCC with delta and delta-delta features ### Training Hyperparameters - Optimizer: AdamW - Learning rate: 0.001 - Batch size: 64 - Early stopping with patience of 10 - Learning rate reduction on plateau - Loss function: Categorical Cross-entropy ## Performance The model achieves strong performance in distinguishing between Marathi, Telugu, and Malayalam speech samples. ### Intended Use This model is designed for: - Language identification in audio samples - Speech processing applications focusing on Indian languages - Research and development in multilingual speech systems ### Limitations - Limited to three languages: Marathi, Telugu, Malayalam - Fixed input length requirement - May not perform well on very noisy audio - Not suitable for real-time processing without proper preprocessing ## Usage ```python import tensorflow as tf import numpy as np import joblib import json import librosa # Load the model, scaler, and config model = tf.keras.models.load_model('indic_language_classifier_mtm.keras') scaler = joblib.load('audio_feature_scaler_mtm.pkl') with open('config_mtm.json', 'r') as f: config = json.load(f) def extract_features(audio_path, config): audio, sr = librosa.load(audio_path, sr=None) mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=config['n_mfcc']) delta_mfccs = librosa.feature.delta(mfccs) delta2_mfccs = librosa.feature.delta(mfccs, order=2) features = np.concatenate((mfccs, delta_mfccs, delta2_mfccs), axis=0) # Pad or truncate if features.shape[1] > config['max_pad_len']: features = features[:, :config['max_pad_len']] else: pad_width = config['max_pad_len'] - features.shape[1] features = np.pad(features, pad_width=((0, 0), (0, pad_width))) return features.T