Language Classifier - Indian Languages (Marathi, Telugu, Malayalam)
This model classifies audio samples into three Indian languages: Marathi, Telugu, and Malayalam.
Model Description
Model Architecture
- 1D Convolutional Neural Network (CNN) with the following key components:
- 3 Convolutional blocks with increasing filters (64, 128, 256)
- Batch Normalization and ReLU activation after each convolution
- MaxPooling and Dropout for regularization
- Dense layers with 256 units followed by a Softmax output layer
- Input: Audio features (MFCC + Delta features)
- Output: Language classification probabilities
Training Data
The model was trained on:
- Total samples per language: 1000
- Training: 700 samples
- Validation: 150 samples
- Test: 150 samples
Features
- MFCC (Mel-frequency cepstral coefficients) with delta features
- Number of MFCC coefficients: 13
- Maximum padding length: 174
- Feature type: MFCC with delta and delta-delta features
Training Hyperparameters
- Optimizer: AdamW
- Learning rate: 0.001
- Batch size: 64
- Early stopping with patience of 10
- Learning rate reduction on plateau
- Loss function: Categorical Cross-entropy
Performance
The model achieves strong performance in distinguishing between Marathi, Telugu, and Malayalam speech samples.
Intended Use
This model is designed for:
- Language identification in audio samples
- Speech processing applications focusing on Indian languages
- Research and development in multilingual speech systems
Limitations
- Limited to three languages: Marathi, Telugu, Malayalam
- Fixed input length requirement
- May not perform well on very noisy audio
- Not suitable for real-time processing without proper preprocessing
Usage
import tensorflow as tf
import numpy as np
import joblib
import json
import librosa
# Load the model, scaler, and config
model = tf.keras.models.load_model('indic_language_classifier_mtm.keras')
scaler = joblib.load('audio_feature_scaler_mtm.pkl')
with open('config_mtm.json', 'r') as f:
config = json.load(f)
def extract_features(audio_path, config):
audio, sr = librosa.load(audio_path, sr=None)
mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=config['n_mfcc'])
delta_mfccs = librosa.feature.delta(mfccs)
delta2_mfccs = librosa.feature.delta(mfccs, order=2)
features = np.concatenate((mfccs, delta_mfccs, delta2_mfccs), axis=0)
# Pad or truncate
if features.shape[1] > config['max_pad_len']:
features = features[:, :config['max_pad_len']]
else:
pad_width = config['max_pad_len'] - features.shape[1]
features = np.pad(features, pad_width=((0, 0), (0, pad_width)))
return features.T
- Downloads last month
- 42
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
HF Inference deployability: The HF Inference API does not support audio-classification models for keras
library.