YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Seq2Seq LSTM with Multi-Head Attention for English β Hindi Translation
Model Overview
This model performs English to Hindi translation using a Seq2Seq architecture with LSTM-based encoder-decoder and multi-head cross-attention. The attention mechanism helps the decoder focus on relevant parts of the input sentence during translation.
- Architecture: BiLSTM Encoder + LSTM Decoder + Multi-Head Cross-Attention
- Task: Language Translation (English β Hindi)
- License: Open for research and demonstration purposes (educational use)
Model Versions
| Model | Parameters | Vocabulary | Training Data | Repository |
|---|---|---|---|---|
| Model A | 12M | 50k | 20k English-Hindi sentence pairs | seq2seq-lstm-multiheadattention-12.3 |
| Model B | 42M | 256k | 100k English-Hindi sentence pairs | seq2seq-lstm-multiheadattention-42 |
- Model A is smaller and performs well on the dataset it was trained on.
- Model B has higher capacity but needs more data for robust generalization.
Intended Use
- Demonstration and educational purposes
- Understanding Seq2Seq + Attention mechanisms
- Translating English sentences to Hindi
- Feature extraction: Encoder outputs can be used for downstream NLP tasks by generating contextual embedding vectors that capture sentence-level semantics
Not Intended For
- High-stakes or production translation systems without further fine-tuning
- Handling very large or domain-specific datasets without retraining
Metrics
- Evaluated qualitatively on selected test sentences
- Model A: good accuracy for small, simple sentences
- Model B: may require larger datasets for generalization
BLEU or other quantitative metrics can be added if evaluation is performed.
Training Data
- Source: Collected English-Hindi parallel sentences
- Size:
- Model A: 20k sentence pairs
- Model B: 100k sentence pairs
- Preprocessing: Tokenization, padding,
<start>/<end>tokens - Dataset: For further Fine-Tuning, training dataset is available in this model card
Limitations
- Larger model may underperform if trained on small datasets
- Handles only sentence-level translation; not optimized for paragraphs
- May produce incorrect translations for rare words or out-of-vocabulary terms
- Larger model is only trained for epoch 1, so do not used it without fine tuning on your own dataset
Example Usage
from huggingface_hub import hf_hub_download
from tensorflow.keras.models import load_model
import pickle
# Load model
model_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-lstm-multiheadattention-12.3.keras")
model = load_model(model_path)
# Load tokenizers
tokenizer_path = hf_hub_download(repo_id="Daksh0505/Seq2Seq-LSTM-MultiHeadAttention", filename="seq2seq-tokenizers-12.3M.pkl")
with open(tokenizer_path, "rb") as f:
tokenizer = pickle.load(f)
tokenizer_en = tokenizer['english']
tokenizer_hi = tokenizer['hindi']
Step-by-Step Prediction Example
For Encoder-Decoder inference visit Daksh0505/Seq2Seq-LSTM-MultiHeadAttention-Translation
import numpy as np
from tensorflow.keras.preprocessing.sequence import pad_sequences
def preprocess_input(sentence, word2idx_en, max_seq_len, oov_token="<OOV>"):
oov_idx = word2idx_en[oov_token]
seq = [word2idx_en.get(w.lower(), oov_idx) for w in sentence.split()]
return pad_sequences([seq], maxlen=max_seq_len, padding='post')
def decode_sequence(input_seq, encoder_model, decoder_model, word2idx_hi, idx2word_hi, max_seq_len):
start_token = word2idx_hi['<start>']
end_token = word2idx_hi['<end>']
enc_outs, h, c = encoder_model.predict(input_seq, verbose=0)
target_seq = np.array([[start_token]])
decoded_sentence = []
for _ in range(max_seq_len):
output_tokens, h, c = decoder_model.predict([target_seq, h, c, enc_outs], verbose=0)
sampled_idx = np.argmax(output_tokens[0,0,:])
if sampled_idx == end_token:
break
if sampled_idx > 0:
decoded_sentence.append(idx2word_hi[sampled_idx])
target_seq[0,0] = sampled_idx
return " ".join(decoded_sentence)
# Example usage
sentence = "Hello, how are you?"
input_seq = preprocess_input(sentence, tokenizer_en.word_index, max_seq_len=40)
translation = decode_sequence(input_seq, encoder_model, decoder_model, tokenizer_hi.word_index, tokenizer_hi.index_word, max_seq_len=40)
print("Predicted Hindi Translation:", translation)
- Downloads last month
- 139
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support