YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Seq2Seq LSTM with Multi-Head Attention for English β†’ Hindi Translation

Model Overview

This model performs English to Hindi translation using a Seq2Seq architecture with LSTM-based encoder-decoder and multi-head cross-attention. The attention mechanism helps the decoder focus on relevant parts of the input sentence during translation.

  • Architecture: BiLSTM Encoder + LSTM Decoder + Multi-Head Cross-Attention
  • Task: Language Translation (English β†’ Hindi)
  • License: Open for research and demonstration purposes (educational use)

Model Versions

Model Parameters Vocabulary Training Data Repository
Model A 12M 50k 20k English-Hindi sentence pairs seq2seq-lstm-multiheadattention-12.3
Model B 42M 256k 100k English-Hindi sentence pairs seq2seq-lstm-multiheadattention-42
  • Model A is smaller and performs well on the dataset it was trained on.
  • Model B has higher capacity but needs more data for robust generalization.

Intended Use

  • Demonstration and educational purposes
  • Understanding Seq2Seq + Attention mechanisms
  • Translating English sentences to Hindi
  • Feature extraction: Encoder outputs can be used for downstream NLP tasks by generating contextual embedding vectors that capture sentence-level semantics

Not Intended For

  • High-stakes or production translation systems without further fine-tuning
  • Handling very large or domain-specific datasets without retraining

Metrics

  • Evaluated qualitatively on selected test sentences
  • Model A: good accuracy for small, simple sentences
  • Model B: may require larger datasets for generalization

BLEU or other quantitative metrics can be added if evaluation is performed.


Training Data

  • Source: Collected English-Hindi parallel sentences
  • Size:
    • Model A: 20k sentence pairs
    • Model B: 100k sentence pairs
  • Preprocessing: Tokenization, padding, <start> / <end> tokens
  • Dataset: For further Fine-Tuning, training dataset is available in this model card

Limitations

  • Larger model may underperform if trained on small datasets
  • Handles only sentence-level translation; not optimized for paragraphs
  • May produce incorrect translations for rare words or out-of-vocabulary terms
  • Larger model is only trained for epoch 1, so do not used it without fine tuning on your own dataset

Example Usage

from huggingface_hub import hf_hub_download
from tensorflow.keras.models import load_model
import pickle

# Load model
model_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-lstm-multiheadattention-12.3.keras")
model = load_model(model_path)

# Load tokenizers
tokenizer_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-tokenizers-12.3M.pkl")
with open(tokenizer_path, "rb") as f:
    tokenizer = pickle.load(f)
tokenizer_en = tokenizer['english']
tokenizer_hi = tokenizer['hindi']

Step-by-Step Prediction Example

For Encoder-Decoder inference visit Daksh0505/Seq2Seq-LSTM-MultiHeadAttention-Translation

import numpy as np
from tensorflow.keras.preprocessing.sequence import pad_sequences

def preprocess_input(sentence, word2idx_en, max_seq_len, oov_token="<OOV>"):
    oov_idx = word2idx_en[oov_token]
    seq = [word2idx_en.get(w.lower(), oov_idx) for w in sentence.split()]
    return pad_sequences([seq], maxlen=max_seq_len, padding='post')

def decode_sequence(input_seq, encoder_model, decoder_model, word2idx_hi, idx2word_hi, max_seq_len):
    start_token = word2idx_hi['<start>']
    end_token = word2idx_hi['<end>']

    enc_outs, h, c = encoder_model.predict(input_seq, verbose=0)
    target_seq = np.array([[start_token]])
    decoded_sentence = []

    for _ in range(max_seq_len):
        output_tokens, h, c = decoder_model.predict([target_seq, h, c, enc_outs], verbose=0)
        sampled_idx = np.argmax(output_tokens[0,0,:])
        if sampled_idx == end_token:
            break
        if sampled_idx > 0:
            decoded_sentence.append(idx2word_hi[sampled_idx])
        target_seq[0,0] = sampled_idx

    return " ".join(decoded_sentence)

# Example usage
sentence = "Hello, how are you?"
input_seq = preprocess_input(sentence, tokenizer_en.word_index, max_seq_len=40)
translation = decode_sequence(input_seq, encoder_model, decoder_model, tokenizer_hi.word_index, tokenizer_hi.index_word, max_seq_len=40)
print("Predicted Hindi Translation:", translation)
Downloads last month
139
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using Daksh0505/Seq2Seq-LSTM-MultiHeadAttention 1