anonamename commited on
Commit
36eb57f
·
verified ·
1 Parent(s): e087902

Upload turing-motors/Heron-NVILA-Lite-2B

Browse files
Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -113,16 +113,19 @@ print("---" * 40)
113
  | Stage3 | Vision Encoder, Projector, LLM | [llava-instruct-v1_5-en-subset-358k](https://huggingface.co/datasets/llm-jp/llava-instruct-v1_5-en-subset-358k), [llava-instruct-ja](https://huggingface.co/datasets/llm-jp/llava-instruct-ja), [japanese-photos-conv](https://huggingface.co/datasets/llm-jp/japanese-photos-conversation), [ja-vg-vqa](https://huggingface.co/datasets/llm-jp/ja-vg-vqa-conversation), [synthdog-ja (subset)](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja), [ai2d](https://huggingface.co/datasets/lmms-lab/ai2d), [synthdog-en](https://huggingface.co/datasets/naver-clova-ix/synthdog-en), [sherlock](https://github.com/allenai/sherlock) | 1.4M |
114
 
115
  ## Evaluation
116
- I used [llm-jp-eval-mm](https://github.com/llm-jp/llm-jp-eval-mm) for this evaluation. All scores other than our models are taken from [llm-jp-eval-mm leaderboard](https://llm-jp.github.io/llm-jp-eval-mm/) and the [Asagi website](https://uehara-mech.github.io/asagi-vlm?v=1).
 
117
 
118
  | Model | LLM Size | Heron-Bench overall LLM (%) | JA-VLM-Bench-In-the-Wild LLM (/5.0) | JA-VG-VQA-500 LLM (/5.0) |
119
  |--------------------------------|----------|------------------------------|-------------------------------------|--------------------------|
 
120
  | **Heron NVILA-Lite 2B** | 1.5B | 52.8 | 3.52 | 3.50 |
121
  | **[Heron NVILA-Lite 15B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-15B)** | 14B | 59.6 | 4.2 | 3.82 |
122
  | [LLaVA-CALM2-SigLIP](https://huggingface.co/cyberagent/llava-calm2-siglip) | 7B | 43.3 | 3.15 | 3.21 |
123
  | [Llama-3-EvoVLM-JP-v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2) | 8B | 39.3 | 2.92 | 2.96 |
124
  | [VILA-jp](https://huggingface.co/llm-jp/llm-jp-3-vila-14b) | 13B | 57.2 | 3.69 | 3.62 |
125
  | [Asagi-14B](https://huggingface.co/MIL-UT/Asagi-14B) | 13B | 55.8 | 3.44 | 3.84 |
 
126
  | [Qwen2-VL 7B Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | 7B | 55.5 | 3.61 | 3.6 |
127
  | GPT-4o | - | 87.6 | 3.85 | 3.58 |
128
 
@@ -151,8 +154,12 @@ This model is based on the results obtained in the project, subsidized by the [G
151
  primaryClass={cs.CV},
152
  url={https://arxiv.org/abs/2412.04468},
153
  }
154
- ```
155
 
156
- ## Model Card Authors
157
-
158
- Shingo Yokoi
 
 
 
 
 
 
113
  | Stage3 | Vision Encoder, Projector, LLM | [llava-instruct-v1_5-en-subset-358k](https://huggingface.co/datasets/llm-jp/llava-instruct-v1_5-en-subset-358k), [llava-instruct-ja](https://huggingface.co/datasets/llm-jp/llava-instruct-ja), [japanese-photos-conv](https://huggingface.co/datasets/llm-jp/japanese-photos-conversation), [ja-vg-vqa](https://huggingface.co/datasets/llm-jp/ja-vg-vqa-conversation), [synthdog-ja (subset)](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja), [ai2d](https://huggingface.co/datasets/lmms-lab/ai2d), [synthdog-en](https://huggingface.co/datasets/naver-clova-ix/synthdog-en), [sherlock](https://github.com/allenai/sherlock) | 1.4M |
114
 
115
  ## Evaluation
116
+
117
+ I used [llm-jp-eval-mm](https://github.com/llm-jp/llm-jp-eval-mm) for this evaluation. Scores for models other than Heron NVILA-Lite and Sarashina2-Vision-14B were taken from [llm-jp-eval-mm leaderboard](https://llm-jp.github.io/llm-jp-eval-mm/) and the [Asagi website](https://uehara-mech.github.io/asagi-vlm?v=1). Heron NVILA-Lite and Sarashina2-Vision-14B evaluated using llm-as-a-judge with "gpt-4o-2024-05-13". Sarashina2-Vision-14B evaluated on the [official blog](https://www.sbintuitions.co.jp/blog/entry/2025/03/17/111703) using "gpt-4o-2024-08-06". Due to differences in evaluation conditions, results for Sarashina2-Vision-14B should be considered as reference only.
118
 
119
  | Model | LLM Size | Heron-Bench overall LLM (%) | JA-VLM-Bench-In-the-Wild LLM (/5.0) | JA-VG-VQA-500 LLM (/5.0) |
120
  |--------------------------------|----------|------------------------------|-------------------------------------|--------------------------|
121
+ | **[Heron NVILA-Lite 1B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-1B)** | 0.5B | 45.9 | 2.92 | 3.16 |
122
  | **Heron NVILA-Lite 2B** | 1.5B | 52.8 | 3.52 | 3.50 |
123
  | **[Heron NVILA-Lite 15B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-15B)** | 14B | 59.6 | 4.2 | 3.82 |
124
  | [LLaVA-CALM2-SigLIP](https://huggingface.co/cyberagent/llava-calm2-siglip) | 7B | 43.3 | 3.15 | 3.21 |
125
  | [Llama-3-EvoVLM-JP-v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2) | 8B | 39.3 | 2.92 | 2.96 |
126
  | [VILA-jp](https://huggingface.co/llm-jp/llm-jp-3-vila-14b) | 13B | 57.2 | 3.69 | 3.62 |
127
  | [Asagi-14B](https://huggingface.co/MIL-UT/Asagi-14B) | 13B | 55.8 | 3.44 | 3.84 |
128
+ | [Sarashina2-Vision-14B](https://huggingface.co/sbintuitions/sarashina2-vision-14b) | 13B | 50.9 | 4.1 | 3.43 |
129
  | [Qwen2-VL 7B Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | 7B | 55.5 | 3.61 | 3.6 |
130
  | GPT-4o | - | 87.6 | 3.85 | 3.58 |
131
 
 
154
  primaryClass={cs.CV},
155
  url={https://arxiv.org/abs/2412.04468},
156
  }
 
157
 
158
+ @inproceedings{maeda2025llm-jp-eval-mm,
159
+ author = {前田 航希 and 杉浦 一瑳 and 小田 悠介 and 栗田 修平 and 岡崎 直観},
160
+ month = mar,
161
+ series = {言語処理学会第31回年次大会 (NLP2025)},
162
+ title = {{llm-jp-eval-mm: 日本語視覚言語モデルの自動評価基盤}},
163
+ year = {2025}
164
+ }
165
+ ```