YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Seq2Seq LSTM with Multi-Head Attention for English → Hindi Translation

Model Overview

This model performs English to Hindi translation using a Seq2Seq architecture with LSTM-based encoder-decoder and multi-head cross-attention. The attention mechanism helps the decoder focus on relevant parts of the input sentence during translation.

Architecture: BiLSTM Encoder + LSTM Decoder + Multi-Head Cross-Attention
Task: Language Translation (English → Hindi)
License: Open for research and demonstration purposes (educational use)

Model Versions

Model	Parameters	Vocabulary	Training Data	Repository
Model A	12M	50k	20k English-Hindi sentence pairs	seq2seq-lstm-multiheadattention-12.3
Model B	42M	256k	100k English-Hindi sentence pairs	seq2seq-lstm-multiheadattention-42

Model A is smaller and performs well on the dataset it was trained on.
Model B has higher capacity but needs more data for robust generalization.

Intended Use

Demonstration and educational purposes
Understanding Seq2Seq + Attention mechanisms
Translating English sentences to Hindi
Feature extraction: Encoder outputs can be used for downstream NLP tasks by generating contextual embedding vectors that capture sentence-level semantics

Not Intended For

High-stakes or production translation systems without further fine-tuning
Handling very large or domain-specific datasets without retraining

Metrics

Evaluated qualitatively on selected test sentences
Model A: good accuracy for small, simple sentences
Model B: may require larger datasets for generalization

BLEU or other quantitative metrics can be added if evaluation is performed.

Training Data

Source: Collected English-Hindi parallel sentences
Size:
- Model A: 20k sentence pairs
- Model B: 100k sentence pairs
Preprocessing: Tokenization, padding, <start> / <end> tokens
Dataset: For further Fine-Tuning, training dataset is available in this model card

Limitations

Larger model may underperform if trained on small datasets
Handles only sentence-level translation; not optimized for paragraphs
May produce incorrect translations for rare words or out-of-vocabulary terms
Larger model is only trained for epoch 1, so do not used it without fine tuning on your own dataset

Example Usage

from huggingface_hub import hf_hub_download
from tensorflow.keras.models import load_model
import pickle

# Load model
model_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-lstm-multiheadattention-12.3.keras")
model = load_model(model_path)

# Load tokenizers
tokenizer_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-tokenizers-12.3M.pkl")
with open(tokenizer_path, "rb") as f:
    tokenizer = pickle.load(f)
tokenizer_en = tokenizer['english']
tokenizer_hi = tokenizer['hindi']

Step-by-Step Prediction Example

For Encoder-Decoder inference visit Daksh0505/Seq2Seq-LSTM-MultiHeadAttention-Translation

import numpy as np
from tensorflow.keras.preprocessing.sequence import pad_sequences

def preprocess_input(sentence, word2idx_en, max_seq_len, oov_token="<OOV>"):
    oov_idx = word2idx_en[oov_token]
    seq = [word2idx_en.get(w.lower(), oov_idx) for w in sentence.split()]
    return pad_sequences([seq], maxlen=max_seq_len, padding='post')

def decode_sequence(input_seq, encoder_model, decoder_model, word2idx_hi, idx2word_hi, max_seq_len):
    start_token = word2idx_hi['<start>']
    end_token = word2idx_hi['<end>']

    enc_outs, h, c = encoder_model.predict(input_seq, verbose=0)
    target_seq = np.array([[start_token]])
    decoded_sentence = []

    for _ in range(max_seq_len):
        output_tokens, h, c = decoder_model.predict([target_seq, h, c, enc_outs], verbose=0)
        sampled_idx = np.argmax(output_tokens[0,0,:])
        if sampled_idx == end_token:
            break
        if sampled_idx > 0:
            decoded_sentence.append(idx2word_hi[sampled_idx])
        target_seq[0,0] = sampled_idx

    return " ".join(decoded_sentence)

# Example usage
sentence = "Hello, how are you?"
input_seq = preprocess_input(sentence, tokenizer_en.word_index, max_seq_len=40)
translation = decode_sequence(input_seq, encoder_model, decoder_model, tokenizer_hi.word_index, tokenizer_hi.index_word, max_seq_len=40)
print("Predicted Hindi Translation:", translation)

Downloads last month: 139

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Daksh0505
/

Seq2Seq-LSTM-MultiHeadAttention