# 🧠 MarianMT-Text-Translation-AI-Model-"en-fr" A **sequence-to-sequence translation model** fine-tuned on English–French sentence pairs. This model translates English text into French and is built using the Hugging Face `MarianMTModel`. It’s ideal for general-purpose translation, educational use, and light regulatory or formal communication tasks between English and French. --- ## ✨ Model Highlights - πŸ“Œ Based on [`Helsinki-NLP/opus-mt-en-fr`](https://huggingface.co/Helsinki-NLP/opus-mt-en-fr) - πŸ” Fine-tuned on a cleaned parallel corpus of English-French sentence pairs - ⚑ Translates from **English β†’ French** - 🧠 Built using **Hugging Face Transformers** and **PyTorch** --- ## 🧠 Intended Uses - βœ… Translating English feedback, emails, or documents into French - βœ… Cross-lingual support for customer service or regulatory communication - βœ… Educational platforms and language learning --- ## 🚫 Limitations - ❌ Not suitable for informal slang or code-mixed inputs - πŸ“ Inputs longer than 128 tokens will be truncated - πŸ€” May produce less accurate translations for highly specialized or domain-specific language - ⚠️ Not intended for legal, medical, or safety-critical translations without expert review --- ## πŸ‹οΈβ€β™‚οΈ Training Details | Attribute | Value | |--------------------|----------------------------------| | Base Model | `Helsinki-NLP/opus-mt-en-fr` | | Dataset | Parallel English-French corpus | | Task Type | Translation | | Max Token Length | 128 | | Epochs | 3 | | Batch Size | 16 | | Optimizer | AdamW | | Loss Function | CrossEntropyLoss | | Framework | PyTorch + Transformers | | Hardware | CUDA-enabled GPU | --- ## πŸ“Š Evaluation Metrics | Metric | Score | |------------|---------| | BLEU Score | 27.82 | --- ## πŸ”Ž Output Details - Input: English text string - Output: Translated French text string --- ## πŸš€ Usage ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM import torch model_name = "AventIQ-AI/MarianMT-Text-Translation-AI-Model-en-fr" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) model.eval() def translate(text): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") finetuned_model.to(device) inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device) outputs = finetuned_model.generate(**inputs) return tokenizer.decode(outputs[0], skip_special_tokens=True) # Example print(translate("Hello, how are you?")) ``` --- ## πŸ“ Repository Structure ``` finetuned-model/ β”œβ”€β”€ config.json βœ… Model architecture & config β”œβ”€β”€ pytorch_model.bin βœ… Model weights β”œβ”€β”€ tokenizer_config.json βœ… Tokenizer settings β”œβ”€β”€ tokenizer.json βœ… Tokenizer vocabulary (JSON format) β”œβ”€β”€ source.spm βœ… SentencePiece model for source language β”œβ”€β”€ target.spm βœ… SentencePiece model for target language β”œβ”€β”€ special_tokens_map.json βœ… Special tokens mapping β”œβ”€β”€ generation_config.json βœ… (Optional) Generation defaults β”œβ”€β”€ README.md βœ… Model card ``` ## 🀝 Contributing Contributions are welcome! Feel free to open an issue or pull request to improve the model, training scripts, or documentation.