vishal1364 commited on
Commit
0a3e976
Β·
verified Β·
1 Parent(s): 49aa0d8

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -0
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🧠 MarianMT-Text-Translation-AI-Model-"en-fr"
2
+
3
+ A **sequence-to-sequence translation model** fine-tuned on English–French sentence pairs. This model translates English text into French and is built using the Hugging Face `MarianMTModel`. It’s ideal for general-purpose translation, educational use, and light regulatory or formal communication tasks between English and French.
4
+
5
+ ---
6
+
7
+ ## ✨ Model Highlights
8
+
9
+ - πŸ“Œ Based on [`Helsinki-NLP/opus-mt-en-fr`](https://huggingface.co/Helsinki-NLP/opus-mt-en-fr)
10
+ - πŸ” Fine-tuned on a cleaned parallel corpus of English-French sentence pairs
11
+ - ⚑ Translates from **English β†’ French**
12
+ - 🧠 Built using **Hugging Face Transformers** and **PyTorch**
13
+
14
+ ---
15
+
16
+ ## 🧠 Intended Uses
17
+
18
+ - βœ… Translating English feedback, emails, or documents into French
19
+ - βœ… Cross-lingual support for customer service or regulatory communication
20
+ - βœ… Educational platforms and language learning
21
+
22
+ ---
23
+
24
+ ## 🚫 Limitations
25
+
26
+ - ❌ Not suitable for informal slang or code-mixed inputs
27
+ - πŸ“ Inputs longer than 128 tokens will be truncated
28
+ - πŸ€” May produce less accurate translations for highly specialized or domain-specific language
29
+ - ⚠️ Not intended for legal, medical, or safety-critical translations without expert review
30
+
31
+ ---
32
+
33
+ ## πŸ‹οΈβ€β™‚οΈ Training Details
34
+
35
+ | Attribute | Value |
36
+ |--------------------|----------------------------------|
37
+ | Base Model | `Helsinki-NLP/opus-mt-en-fr` |
38
+ | Dataset | Parallel English-French corpus |
39
+ | Task Type | Translation |
40
+ | Max Token Length | 128 |
41
+ | Epochs | 3 |
42
+ | Batch Size | 16 |
43
+ | Optimizer | AdamW |
44
+ | Loss Function | CrossEntropyLoss |
45
+ | Framework | PyTorch + Transformers |
46
+ | Hardware | CUDA-enabled GPU |
47
+
48
+ ---
49
+
50
+ ## πŸ“Š Evaluation Metrics
51
+
52
+ | Metric | Score |
53
+ |------------|---------|
54
+ | BLEU Score | 27.82 |
55
+
56
+ ---
57
+
58
+ ## πŸ”Ž Output Details
59
+
60
+ - Input: English text string
61
+ - Output: Translated French text string
62
+
63
+ ---
64
+
65
+ ## πŸš€ Usage
66
+
67
+ ```python
68
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
69
+ import torch
70
+
71
+ model_name = "AventIQ-AI/MarianMT-Text-Translation-AI-Model-en-fr"
72
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
73
+ model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
74
+ model.eval()
75
+
76
+ def translate(text):
77
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
78
+ finetuned_model.to(device)
79
+ inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device)
80
+ outputs = finetuned_model.generate(**inputs)
81
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
82
+
83
+ # Example
84
+ print(translate("Hello, how are you?"))
85
+ ```
86
+ ---
87
+ ## πŸ“ Repository Structure
88
+ ```
89
+ finetuned-model/
90
+ β”œβ”€β”€ config.json βœ… Model architecture & config
91
+ β”œβ”€β”€ pytorch_model.bin βœ… Model weights
92
+ β”œβ”€β”€ tokenizer_config.json βœ… Tokenizer settings
93
+ β”œβ”€β”€ tokenizer.json βœ… Tokenizer vocabulary (JSON format)
94
+ β”œβ”€β”€ source.spm βœ… SentencePiece model for source language
95
+ β”œβ”€β”€ target.spm βœ… SentencePiece model for target language
96
+ β”œβ”€β”€ special_tokens_map.json βœ… Special tokens mapping
97
+ β”œβ”€β”€ generation_config.json βœ… (Optional) Generation defaults
98
+ β”œβ”€β”€ README.md βœ… Model card
99
+
100
+ ```
101
+ ## 🀝 Contributing
102
+ Contributions are welcome! Feel free to open an issue or pull request to improve the model, training scripts, or documentation.