lbourdois commited on
Commit
71bb4bf
·
verified ·
1 Parent(s): bf6617e

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +170 -257
README.md CHANGED
@@ -1,258 +1,171 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- language:
5
- - multilingual
6
- - af
7
- - am
8
- - ar
9
- - as
10
- - azb
11
- - be
12
- - bg
13
- - bm
14
- - bn
15
- - bo
16
- - bs
17
- - ca
18
- - ceb
19
- - cs
20
- - cy
21
- - da
22
- - de
23
- - du
24
- - el
25
- - en
26
- - eo
27
- - es
28
- - et
29
- - eu
30
- - fa
31
- - fi
32
- - fr
33
- - ga
34
- - gd
35
- - gl
36
- - ha
37
- - hi
38
- - hr
39
- - ht
40
- - hu
41
- - id
42
- - ig
43
- - is
44
- - it
45
- - iw
46
- - ja
47
- - jv
48
- - ka
49
- - ki
50
- - kk
51
- - km
52
- - ko
53
- - la
54
- - lb
55
- - ln
56
- - lo
57
- - lt
58
- - lv
59
- - mi
60
- - mr
61
- - ms
62
- - mt
63
- - my
64
- - 'no'
65
- - oc
66
- - pa
67
- - pl
68
- - pt
69
- - qu
70
- - ro
71
- - ru
72
- - sa
73
- - sc
74
- - sd
75
- - sg
76
- - sk
77
- - sl
78
- - sm
79
- - so
80
- - sq
81
- - sr
82
- - ss
83
- - sv
84
- - sw
85
- - ta
86
- - te
87
- - th
88
- - ti
89
- - tl
90
- - tn
91
- - tpi
92
- - tr
93
- - ts
94
- - tw
95
- - uk
96
- - ur
97
- - uz
98
- - vi
99
- - war
100
- - wo
101
- - xh
102
- - yo
103
- - zh
104
- - zu
105
- base_model:
106
- - Qwen/Qwen2.5-7B-Instruct
107
- - timm/ViT-SO400M-14-SigLIP-384
108
- pipeline_tag: image-text-to-text
109
- ---
110
-
111
- # Centurio Qwen
112
-
113
- ## Model Details
114
-
115
- ### Model Description
116
-
117
- <!-- Provide a longer summary of what this model is. -->
118
-
119
- - **Model type:** Centurio is an open-source multilingual large vision-language model.
120
- - **Training Data:** COMING SOON
121
- - **Languages:** The model was trained with the following 100 languages: `af, am, ar, ar-eg, as, azb, be, bg, bm, bn, bo, bs, ca, ceb, cs, cy, da, de, du, el, en, eo, es, et, eu, fa, fi, fr, ga, gd, gl, ha, hi, hr, ht, hu, id, ig, is, it, iw, ja, jv, ka, ki, kk, km, ko, la, lb, ln, lo, lt, lv, mi, mr, ms, mt, my, no, oc, pa, pl, pt, qu, ro, ru, sa, sc, sd, sg, sk, sl, sm, so, sq, sr, ss, sv, sw, ta, te, th, ti, tl, tn, tpi, tr, ts, tw, uk, ur, uz, vi, war, wo, xh, yo, zh, zu
122
- `
123
- - **License:** This work is released under the Apache 2.0 license.
124
-
125
- ### Model Sources
126
-
127
- <!-- Provide the basic links for the model. -->
128
-
129
- - **Repository:** [gregor-ge.github.io/Centurio](https://gregor-ge.github.io/Centurio)
130
- - **Paper:** [arXiv](https://arxiv.org/abs/2501.)
131
-
132
- ## Uses
133
-
134
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
135
-
136
- ### Direct Use
137
-
138
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
139
-
140
- The model can be used directly through the `transformers` library with our custom code.
141
-
142
- ```python
143
- from transformers import AutoModelForCausalLM, AutoProcessor
144
- import timm
145
- from PIL import Image
146
- import requests
147
-
148
- url = "https://upload.wikimedia.org/wikipedia/commons/b/bd/Golden_Retriever_Dukedestiny01_drvd.jpg"
149
- image = Image.open(requests.get(url, stream=True).raw)
150
-
151
- model_name = "WueNLP/centurio_qwen"
152
-
153
- processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
154
-
155
- ## Appearance of images in the prompt are indicates with '<image_placeholder>'!
156
- prompt = "<image_placeholder>\nBriefly describe the image in German."
157
-
158
- messages = [
159
- {"role": "system", "content": "You are a helpful assistant."}, # This is the system prompt used during our training.
160
- {"role": "user", "content": prompt}
161
- ]
162
-
163
- text = processor.apply_chat_template(
164
- messages,
165
- tokenize=False,
166
- add_generation_prompt=True
167
- )
168
-
169
- model = AutoModelForCausalLM.from_pretrained(
170
- model_name,
171
- trust_remote_code=True
172
- )
173
-
174
- model_inputs = processor(text=[text], images=[image] return_tensors="pt").to(model.device)
175
-
176
- generated_ids = model.generate(
177
- **model_inputs,
178
- max_new_tokens=128
179
- )
180
-
181
- generated_ids = [
182
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
183
- ]
184
-
185
- response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
186
-
187
- ```
188
-
189
- #### Multiple Images
190
- We natively support multi-image inputs. You only have to 1) include more `<image_placeholder>` while 2) passing all images of the *entire batch* as a flat list:
191
-
192
- ```python
193
- [...]
194
- # Variables reused from above.
195
-
196
- processor.tokenizer.padding_side = "left" # default is 'right' but has to be 'left' for batched generation to work correctly!
197
-
198
- image_multi_1, image_multi_2 = [...] # prepare additional images
199
-
200
- prompt_multi = "What is the difference between the following images?\n<image_placeholder><image_placeholder>\nAnswer in German."
201
-
202
- messages_multi = [
203
- {"role": "system", "content": "You are a helpful assistant."},
204
- {"role": "user", "content": prompt_multi}
205
- ]
206
-
207
- text_multi = processor.apply_chat_template(
208
- messages,
209
- tokenize=False,
210
- add_generation_prompt=True
211
- )
212
-
213
- model_inputs = processor(text=[text, text_multi], images=[image, image_multi_1, image_multi_2] return_tensors="pt").to(model.device)
214
-
215
- generated_ids = model.generate(
216
- **model_inputs,
217
- max_new_tokens=128
218
- )
219
-
220
- [...]
221
-
222
- ```
223
-
224
-
225
-
226
-
227
- ## Bias, Risks, and Limitations
228
-
229
- - General biases, risks, and limitations of large vision-language models like hallucinations or biases from training data apply.
230
- - This is a research project and *not* recommended for production use.
231
- - Multilingual: Performance and generation quality can differ widely between languages.
232
- - OCR: Model struggles both with small text and writing in non-Latin scripts.
233
-
234
-
235
- ## Citation
236
-
237
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
238
-
239
- **BibTeX:**
240
-
241
- ```
242
- @article{centurio2025,
243
- author = {Gregor Geigle and
244
- Florian Schneider and
245
- Carolin Holtermann and
246
- Chris Biemann and
247
- Radu Timofte and
248
- Anne Lauscher and
249
- Goran Glava\v{s}},
250
- title = {Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model},
251
- journal = {arXiv},
252
- volume = {abs/2501.05122},
253
- year = {2025},
254
- url = {https://arxiv.org/abs/2501.05122},
255
- eprinttype = {arXiv},
256
- eprint = {2501.05122},
257
- }
258
  ```
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ language:
5
+ - zho
6
+ - eng
7
+ - fra
8
+ - spa
9
+ - por
10
+ - deu
11
+ - ita
12
+ - rus
13
+ - jpn
14
+ - kor
15
+ - vie
16
+ - tha
17
+ - ara
18
+ base_model:
19
+ - Qwen/Qwen2.5-7B-Instruct
20
+ - timm/ViT-SO400M-14-SigLIP-384
21
+ pipeline_tag: image-text-to-text
22
+ ---
23
+
24
+ # Centurio Qwen
25
+
26
+ ## Model Details
27
+
28
+ ### Model Description
29
+
30
+ <!-- Provide a longer summary of what this model is. -->
31
+
32
+ - **Model type:** Centurio is an open-source multilingual large vision-language model.
33
+ - **Training Data:** COMING SOON
34
+ - **Languages:** The model was trained with the following 100 languages: `af, am, ar, ar-eg, as, azb, be, bg, bm, bn, bo, bs, ca, ceb, cs, cy, da, de, du, el, en, eo, es, et, eu, fa, fi, fr, ga, gd, gl, ha, hi, hr, ht, hu, id, ig, is, it, iw, ja, jv, ka, ki, kk, km, ko, la, lb, ln, lo, lt, lv, mi, mr, ms, mt, my, no, oc, pa, pl, pt, qu, ro, ru, sa, sc, sd, sg, sk, sl, sm, so, sq, sr, ss, sv, sw, ta, te, th, ti, tl, tn, tpi, tr, ts, tw, uk, ur, uz, vi, war, wo, xh, yo, zh, zu
35
+ `
36
+ - **License:** This work is released under the Apache 2.0 license.
37
+
38
+ ### Model Sources
39
+
40
+ <!-- Provide the basic links for the model. -->
41
+
42
+ - **Repository:** [gregor-ge.github.io/Centurio](https://gregor-ge.github.io/Centurio)
43
+ - **Paper:** [arXiv](https://arxiv.org/abs/2501.)
44
+
45
+ ## Uses
46
+
47
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
48
+
49
+ ### Direct Use
50
+
51
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
52
+
53
+ The model can be used directly through the `transformers` library with our custom code.
54
+
55
+ ```python
56
+ from transformers import AutoModelForCausalLM, AutoProcessor
57
+ import timm
58
+ from PIL import Image
59
+ import requests
60
+
61
+ url = "https://upload.wikimedia.org/wikipedia/commons/b/bd/Golden_Retriever_Dukedestiny01_drvd.jpg"
62
+ image = Image.open(requests.get(url, stream=True).raw)
63
+
64
+ model_name = "WueNLP/centurio_qwen"
65
+
66
+ processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
67
+
68
+ ## Appearance of images in the prompt are indicates with '<image_placeholder>'!
69
+ prompt = "<image_placeholder>\nBriefly describe the image in German."
70
+
71
+ messages = [
72
+ {"role": "system", "content": "You are a helpful assistant."}, # This is the system prompt used during our training.
73
+ {"role": "user", "content": prompt}
74
+ ]
75
+
76
+ text = processor.apply_chat_template(
77
+ messages,
78
+ tokenize=False,
79
+ add_generation_prompt=True
80
+ )
81
+
82
+ model = AutoModelForCausalLM.from_pretrained(
83
+ model_name,
84
+ trust_remote_code=True
85
+ )
86
+
87
+ model_inputs = processor(text=[text], images=[image] return_tensors="pt").to(model.device)
88
+
89
+ generated_ids = model.generate(
90
+ **model_inputs,
91
+ max_new_tokens=128
92
+ )
93
+
94
+ generated_ids = [
95
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
96
+ ]
97
+
98
+ response = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
99
+
100
+ ```
101
+
102
+ #### Multiple Images
103
+ We natively support multi-image inputs. You only have to 1) include more `<image_placeholder>` while 2) passing all images of the *entire batch* as a flat list:
104
+
105
+ ```python
106
+ [...]
107
+ # Variables reused from above.
108
+
109
+ processor.tokenizer.padding_side = "left" # default is 'right' but has to be 'left' for batched generation to work correctly!
110
+
111
+ image_multi_1, image_multi_2 = [...] # prepare additional images
112
+
113
+ prompt_multi = "What is the difference between the following images?\n<image_placeholder><image_placeholder>\nAnswer in German."
114
+
115
+ messages_multi = [
116
+ {"role": "system", "content": "You are a helpful assistant."},
117
+ {"role": "user", "content": prompt_multi}
118
+ ]
119
+
120
+ text_multi = processor.apply_chat_template(
121
+ messages,
122
+ tokenize=False,
123
+ add_generation_prompt=True
124
+ )
125
+
126
+ model_inputs = processor(text=[text, text_multi], images=[image, image_multi_1, image_multi_2] return_tensors="pt").to(model.device)
127
+
128
+ generated_ids = model.generate(
129
+ **model_inputs,
130
+ max_new_tokens=128
131
+ )
132
+
133
+ [...]
134
+
135
+ ```
136
+
137
+
138
+
139
+
140
+ ## Bias, Risks, and Limitations
141
+
142
+ - General biases, risks, and limitations of large vision-language models like hallucinations or biases from training data apply.
143
+ - This is a research project and *not* recommended for production use.
144
+ - Multilingual: Performance and generation quality can differ widely between languages.
145
+ - OCR: Model struggles both with small text and writing in non-Latin scripts.
146
+
147
+
148
+ ## Citation
149
+
150
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
151
+
152
+ **BibTeX:**
153
+
154
+ ```
155
+ @article{centurio2025,
156
+ author = {Gregor Geigle and
157
+ Florian Schneider and
158
+ Carolin Holtermann and
159
+ Chris Biemann and
160
+ Radu Timofte and
161
+ Anne Lauscher and
162
+ Goran Glava\v{s}},
163
+ title = {Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model},
164
+ journal = {arXiv},
165
+ volume = {abs/2501.05122},
166
+ year = {2025},
167
+ url = {https://arxiv.org/abs/2501.05122},
168
+ eprinttype = {arXiv},
169
+ eprint = {2501.05122},
170
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
  ```