Files changed (1) hide show
  1. README.md +74 -63
README.md CHANGED
@@ -1,63 +1,74 @@
1
-
2
- ---
3
- license: apache-2.0
4
- datasets:
5
- - allenai/MADLAD-400
6
- language:
7
- - bn
8
- base_model:
9
- - Qwen/Qwen2.5-7B-Instruct
10
- library_name: transformers
11
- ---
12
- # Qwen2.5 7B Instruct for Bengali: Vocabulary expansion
13
-
14
- This model is built on top of Qwen2.5 7B Instruct adapted for Bengali using 500M target language tokens sampled from MADLAD-400. It has an additional target vocabulary of 10K.
15
-
16
- ## Model Details
17
-
18
- * **Vocabulary**: This model has an additional target vocabulary of 10K.
19
- * **Target vocabulary initialization**: The target weights of the embedding and LM head were initialized using mean initialization.
20
- * **Training**: This model was continually pre-trained on 500M target language tokens sampled from MADLAD-400.
21
-
22
-
23
- ## Model Description
24
-
25
- - **Language:** Bengali
26
- - **License:** Apache 2.0
27
- - **Fine-tuned from model:** Qwen/Qwen2.5-7B-Instruct
28
-
29
-
30
- ## Model Sources
31
-
32
- - **Repository:** https://github.com/gucci-j/chat-cve
33
- - **Paper:** https://arxiv.org/abs/2412.11704
34
-
35
-
36
- ## How to Get Started with the Model
37
- Use the code below to get started with the model.
38
- ```python
39
- from transformers import AutoTokenizer, AutoModelForCausalLM
40
-
41
- model = AutoModelForCausalLM.from_pretrained(
42
- "atsuki-yamaguchi/Qwen2.5-7B-Instruct-bn-madlad-mean-tuned"
43
- )
44
- tokenizer = AutoTokenizer.from_pretrained(
45
- "atsuki-yamaguchi/Qwen2.5-7B-Instruct-bn-madlad-mean-tuned"
46
- )
47
- ```
48
-
49
-
50
- ## Citation
51
- ```
52
- @misc{yamaguchi2024vocabularyexpansionchatmodels,
53
- title={{ElChat}: Adapting Chat Language Models Using Only Target Unlabeled Language Data},
54
- author={Atsuki Yamaguchi and Terufumi Morishita and Aline Villavicencio and Nikolaos Aletras},
55
- year={2024},
56
- eprint={2412.11704},
57
- archivePrefix={arXiv},
58
- primaryClass={cs.CL},
59
- url={https://arxiv.org/abs/2412.11704},
60
- }
61
- ```
62
-
63
-
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - allenai/MADLAD-400
5
+ language:
6
+ - zho
7
+ - eng
8
+ - fra
9
+ - spa
10
+ - por
11
+ - deu
12
+ - ita
13
+ - rus
14
+ - jpn
15
+ - kor
16
+ - vie
17
+ - tha
18
+ - ara
19
+ base_model:
20
+ - Qwen/Qwen2.5-7B-Instruct
21
+ library_name: transformers
22
+ ---
23
+ # Qwen2.5 7B Instruct for Bengali: Vocabulary expansion
24
+
25
+ This model is built on top of Qwen2.5 7B Instruct adapted for Bengali using 500M target language tokens sampled from MADLAD-400. It has an additional target vocabulary of 10K.
26
+
27
+ ## Model Details
28
+
29
+ * **Vocabulary**: This model has an additional target vocabulary of 10K.
30
+ * **Target vocabulary initialization**: The target weights of the embedding and LM head were initialized using mean initialization.
31
+ * **Training**: This model was continually pre-trained on 500M target language tokens sampled from MADLAD-400.
32
+
33
+
34
+ ## Model Description
35
+
36
+ - **Language:** Bengali
37
+ - **License:** Apache 2.0
38
+ - **Fine-tuned from model:** Qwen/Qwen2.5-7B-Instruct
39
+
40
+
41
+ ## Model Sources
42
+
43
+ - **Repository:** https://github.com/gucci-j/chat-cve
44
+ - **Paper:** https://arxiv.org/abs/2412.11704
45
+
46
+
47
+ ## How to Get Started with the Model
48
+ Use the code below to get started with the model.
49
+ ```python
50
+ from transformers import AutoTokenizer, AutoModelForCausalLM
51
+
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ "atsuki-yamaguchi/Qwen2.5-7B-Instruct-bn-madlad-mean-tuned"
54
+ )
55
+ tokenizer = AutoTokenizer.from_pretrained(
56
+ "atsuki-yamaguchi/Qwen2.5-7B-Instruct-bn-madlad-mean-tuned"
57
+ )
58
+ ```
59
+
60
+
61
+ ## Citation
62
+ ```
63
+ @misc{yamaguchi2024vocabularyexpansionchatmodels,
64
+ title={{ElChat}: Adapting Chat Language Models Using Only Target Unlabeled Language Data},
65
+ author={Atsuki Yamaguchi and Terufumi Morishita and Aline Villavicencio and Nikolaos Aletras},
66
+ year={2024},
67
+ eprint={2412.11704},
68
+ archivePrefix={arXiv},
69
+ primaryClass={cs.CL},
70
+ url={https://arxiv.org/abs/2412.11704},
71
+ }
72
+ ```
73
+
74
+