Language Classifier - Indian Languages (Marathi, Telugu, Malayalam)

This model classifies audio samples into three Indian languages: Marathi, Telugu, and Malayalam.

Model Description

Model Architecture

  • 1D Convolutional Neural Network (CNN) with the following key components:
    • 3 Convolutional blocks with increasing filters (64, 128, 256)
    • Batch Normalization and ReLU activation after each convolution
    • MaxPooling and Dropout for regularization
    • Dense layers with 256 units followed by a Softmax output layer
  • Input: Audio features (MFCC + Delta features)
  • Output: Language classification probabilities

Training Data

The model was trained on:

  • Total samples per language: 1000
    • Training: 700 samples
    • Validation: 150 samples
    • Test: 150 samples

Features

  • MFCC (Mel-frequency cepstral coefficients) with delta features
  • Number of MFCC coefficients: 13
  • Maximum padding length: 174
  • Feature type: MFCC with delta and delta-delta features

Training Hyperparameters

  • Optimizer: AdamW
  • Learning rate: 0.001
  • Batch size: 64
  • Early stopping with patience of 10
  • Learning rate reduction on plateau
  • Loss function: Categorical Cross-entropy

Performance

The model achieves strong performance in distinguishing between Marathi, Telugu, and Malayalam speech samples.

Intended Use

This model is designed for:

  • Language identification in audio samples
  • Speech processing applications focusing on Indian languages
  • Research and development in multilingual speech systems

Limitations

  • Limited to three languages: Marathi, Telugu, Malayalam
  • Fixed input length requirement
  • May not perform well on very noisy audio
  • Not suitable for real-time processing without proper preprocessing

Usage

import tensorflow as tf
import numpy as np
import joblib
import json
import librosa

# Load the model, scaler, and config
model = tf.keras.models.load_model('indic_language_classifier_mtm.keras')
scaler = joblib.load('audio_feature_scaler_mtm.pkl')
with open('config_mtm.json', 'r') as f:
    config = json.load(f)

def extract_features(audio_path, config):
    audio, sr = librosa.load(audio_path, sr=None)
    mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=config['n_mfcc'])
    delta_mfccs = librosa.feature.delta(mfccs)
    delta2_mfccs = librosa.feature.delta(mfccs, order=2)
    features = np.concatenate((mfccs, delta_mfccs, delta2_mfccs), axis=0)
    
    # Pad or truncate
    if features.shape[1] > config['max_pad_len']:
        features = features[:, :config['max_pad_len']]
    else:
        pad_width = config['max_pad_len'] - features.shape[1]
        features = np.pad(features, pad_width=((0, 0), (0, pad_width)))
    
    return features.T
Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using hriteshMaikap/languageClassifier 1