prajdabre commited on
Commit
00213ee
·
verified ·
1 Parent(s): 68d86b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -11
README.md CHANGED
@@ -7,7 +7,7 @@ license: mit
7
  These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
8
 
9
  _NOTE_:
10
- These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://arxiv.org/abs/2408.11382).
11
 
12
  Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
13
 
@@ -18,7 +18,7 @@ The usage instructions are very similar to [IndicTrans2 HuggingFace models](http
18
  ```python
19
  import torch
20
  import warnings
21
- from IndicTransToolkit import IndicProcessor
22
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
23
 
24
  warnings.filterwarnings("ignore")
@@ -69,20 +69,28 @@ print(" | > Translations:", outputs[0])
69
  If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
70
 
71
  ```bibtex
72
- @misc{gumma2025inducinglongcontextabilitiesmultilingual,
73
- title={Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models},
74
- author={Varun Gumma and Pranjal A. Chitale and Kalika Bali},
75
- year={2025},
76
- eprint={2408.11382},
77
- archivePrefix={arXiv},
78
- primaryClass={cs.CL},
79
- url={https://arxiv.org/abs/2408.11382},
 
 
 
 
 
 
 
 
80
  }
81
  ```
82
 
83
  # Note
84
 
85
- These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
86
 
87
  # Warning
88
 
 
7
  These models are created from their respective IndicTrans2 parent versions by simplying replacing the Sinusoidal Positional Embedding with Rotary Positional Embedding ([Su _et al._](https://arxiv.org/abs/2104.09864)), and finetuning them for further alignment.
8
 
9
  _NOTE_:
10
+ These models are my independent reproduction of the paper: [Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models](https://aclanthology.org/2025.naacl-long.366/).
11
 
12
  Detailed information on the data mixture, hyperparameters, and training curriculum can be found in the paper.
13
 
 
18
  ```python
19
  import torch
20
  import warnings
21
+ from IndicTransToolkit.processor import IndicProcessor
22
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
23
 
24
  warnings.filterwarnings("ignore")
 
69
  If you use these models directly or fine-tune them further for additional use cases, please cite the following work:
70
 
71
  ```bibtex
72
+ @inproceedings{gumma-etal-2025-towards,
73
+ title = "Towards Inducing Long-Context Abilities in Multilingual Neural Machine Translation Models",
74
+ author = "Gumma, Varun and
75
+ Chitale, Pranjal A and
76
+ Bali, Kalika",
77
+ editor = "Chiruzzo, Luis and
78
+ Ritter, Alan and
79
+ Wang, Lu",
80
+ booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
81
+ month = apr,
82
+ year = "2025",
83
+ address = "Albuquerque, New Mexico",
84
+ publisher = "Association for Computational Linguistics",
85
+ url = "https://aclanthology.org/2025.naacl-long.366/",
86
+ pages = "7158--7170",
87
+ ISBN = "979-8-89176-189-6"
88
  }
89
  ```
90
 
91
  # Note
92
 
93
+ These new and improved models are primarily built and tested for document-level and long-context translations, and the performance of smaller sentence-level tasks might be slightly sub-optimal, and might require generation parameter tuning. Please throughly verify the performance of the models for your usecase before scaling up generation.
94
 
95
  # Warning
96