Image-Text-to-Text
English
lbourdois commited on
Commit
fd34608
·
verified ·
1 Parent(s): 8b297ff

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +46 -34
README.md CHANGED
@@ -1,35 +1,47 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-1.5B-Instruct
4
- - google/siglip-so400m-patch14-384
5
- datasets:
6
- - weizhiwang/Open-Qwen2VL-Data
7
- - MAmmoTH-VL/MAmmoTH-VL-Instruct-12M
8
- language:
9
- - en
10
- license: cc
11
- pipeline_tag: image-text-to-text
12
- ---
13
-
14
- # Model Card for Open-Qwen2VL-base
15
-
16
- Open-Qwen2VL-base is a pre-trained base multimodal model that takes images and text as input and produces text as output. This model is described in the paper [Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources](https://huggingface.co/papers/2504.00595). The code is available at [https://github.com/Victorwz/Open-Qwen2VL](https://github.com/Victorwz/Open-Qwen2VL).
17
-
18
- ## Updates
19
- - [4/1/2025] The codebase, model, data, and paper are released.
20
-
21
- <!-- ## Model Details -->
22
-
23
- ## How to Use
24
-
25
- The base model is released for further fine-tuning on public SFT data or customized SFT data. It is not appropriate for normal task completions.
26
-
27
- ## Citation
28
- ```bibtex
29
- @article{Open-Qwen2VL,
30
- title={Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources},
31
- author={Wang, Weizhi and Tian, Yu and Yang, Linjie and Wang, Heng and Yan, Xifeng},
32
- journal={arXiv preprint arXiv:2504.00595},
33
- year={2025}
34
- }
 
 
 
 
 
 
 
 
 
 
 
 
35
  ...
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-1.5B-Instruct
4
+ - google/siglip-so400m-patch14-384
5
+ datasets:
6
+ - weizhiwang/Open-Qwen2VL-Data
7
+ - MAmmoTH-VL/MAmmoTH-VL-Instruct-12M
8
+ language:
9
+ - zho
10
+ - eng
11
+ - fra
12
+ - spa
13
+ - por
14
+ - deu
15
+ - ita
16
+ - rus
17
+ - jpn
18
+ - kor
19
+ - vie
20
+ - tha
21
+ - ara
22
+ license: cc
23
+ pipeline_tag: image-text-to-text
24
+ ---
25
+
26
+ # Model Card for Open-Qwen2VL-base
27
+
28
+ Open-Qwen2VL-base is a pre-trained base multimodal model that takes images and text as input and produces text as output. This model is described in the paper [Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources](https://huggingface.co/papers/2504.00595). The code is available at [https://github.com/Victorwz/Open-Qwen2VL](https://github.com/Victorwz/Open-Qwen2VL).
29
+
30
+ ## Updates
31
+ - [4/1/2025] The codebase, model, data, and paper are released.
32
+
33
+ <!-- ## Model Details -->
34
+
35
+ ## How to Use
36
+
37
+ The base model is released for further fine-tuning on public SFT data or customized SFT data. It is not appropriate for normal task completions.
38
+
39
+ ## Citation
40
+ ```bibtex
41
+ @article{Open-Qwen2VL,
42
+ title={Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources},
43
+ author={Wang, Weizhi and Tian, Yu and Yang, Linjie and Wang, Heng and Yan, Xifeng},
44
+ journal={arXiv preprint arXiv:2504.00595},
45
+ year={2025}
46
+ }
47
  ...