prajdabre
/

rotary-indictrans2-indic-en-dist-200M

PyTorch

RotaryIndicTrans

custom_code

Model card Files Files and versions Community

prajdabre commited on May 2

Commit

00213ee

verified ·

1 Parent(s): 68d86b5

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -11

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ license: mit
 These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
 _NOTE_:
-These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://arxiv.org/abs/2408.11382).
 Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
@@ -18,7 +18,7 @@ The usage instructions are very similar to [IndicTrans2 HuggingFace models](http
 ```python
 import torch
 import warnings
-from IndicTransToolkit import IndicProcessor
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 warnings.filterwarnings("ignore")
@@ -69,20 +69,28 @@ print(" | > Translations:", outputs[0])
 If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
 ```bibtex
-@misc{gumma2025inducinglongcontextabilitiesmultilingual,
-      title={Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models},
-      author={Varun Gumma and Pranjal A. Chitale and Kalika Bali},
-      year={2025},
-      eprint={2408.11382},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2408.11382},
 }
 ```
 # Note
-These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
 # Warning

 These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
 _NOTE_:
+These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://aclanthology.org/2025.naacl-long.366/).
 Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
 ```python
 import torch
 import warnings
+from IndicTransToolkit.processor import IndicProcessor
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
 warnings.filterwarnings("ignore")
 If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
 ```bibtex
+@inproceedings{gumma-etal-2025-towards,
+    title = "Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models",
+    author = "Gumma, Varun  and
+      Chitale, Pranjal A  and
+      Bali, Kalika",
+    editor = "Chiruzzo, Luis  and
+      Ritter, Alan  and
+      Wang, Lu",
+    booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
+    month = apr,
+    year = "2025",
+    address = "Albuquerque, New Mexico",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2025.naacl-long.366/",
+    pages = "7158--7170",
+    ISBN = "979-8-89176-189-6"
 }
 ```
 # Note
+These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be slightly sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
 # Warning