anonamename commited on
Commit
e178cff
·
verified ·
1 Parent(s): ce4c3eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -14,9 +14,9 @@ base_model:
14
  - Efficient-Large-Model/paligemma-siglip-so400m-patch14-448
15
  pipeline_tag: image-text-to-text
16
  ---
17
- # Heron NVILA-Lite 2B
18
 
19
- Heron NVILA-Lite 2B is a vision language model trained for Japanese, based on the [NVILA](https://arxiv.org/abs/2412.04468)-Lite architecture.
20
 
21
  ## Model Overview
22
 
@@ -115,13 +115,13 @@ print("---" * 40)
115
 
116
  ## Evaluation
117
 
118
- I used [llm-jp-eval-mm](https://github.com/llm-jp/llm-jp-eval-mm) for this evaluation. Scores for models other than Heron NVILA-Lite and Sarashina2-Vision-14B were taken from [llm-jp-eval-mm leaderboard](https://llm-jp.github.io/llm-jp-eval-mm/) as of March 2025 and the [Asagi website](https://uehara-mech.github.io/asagi-vlm?v=1). Heron NVILA-Lite and Sarashina2-Vision-14B were evaluated using llm-as-a-judge with "gpt-4o-2024-05-13". Sarashina2-Vision-14B was evaluated on the [official blog](https://www.sbintuitions.co.jp/blog/entry/2025/03/17/111703) using "gpt-4o-2024-08-06"; please note that due to differing evaluation conditions, the results for Sarashina2-Vision-14B should be treated as reference only.
119
 
120
  | Model | LLM Size | Heron-Bench overall LLM (%) | JA-VLM-Bench-In-the-Wild LLM (/5.0) | JA-VG-VQA-500 LLM (/5.0) |
121
  |--------------------------------|----------|------------------------------|-------------------------------------|--------------------------|
122
- | **[Heron NVILA-Lite 1B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-1B)** | 0.5B | 45.9 | 2.92 | 3.16 |
123
- | **Heron NVILA-Lite 2B** | 1.5B | 52.8 | 3.52 | 3.50 |
124
- | **[Heron NVILA-Lite 15B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-15B)** | 14B | 59.6 | 4.2 | 3.82 |
125
  | [LLaVA-CALM2-SigLIP](https://huggingface.co/cyberagent/llava-calm2-siglip) | 7B | 43.3 | 3.15 | 3.21 |
126
  | [Llama-3-EvoVLM-JP-v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2) | 8B | 39.3 | 2.92 | 2.96 |
127
  | [VILA-jp](https://huggingface.co/llm-jp/llm-jp-3-vila-14b) | 13B | 57.2 | 3.69 | 3.62 |
 
14
  - Efficient-Large-Model/paligemma-siglip-so400m-patch14-448
15
  pipeline_tag: image-text-to-text
16
  ---
17
+ # Heron-NVILA-Lite-2B
18
 
19
+ Heron-NVILA-Lite-2B is a vision language model trained for Japanese, based on the [NVILA](https://arxiv.org/abs/2412.04468)-Lite architecture.
20
 
21
  ## Model Overview
22
 
 
115
 
116
  ## Evaluation
117
 
118
+ I used [llm-jp-eval-mm](https://github.com/llm-jp/llm-jp-eval-mm) for this evaluation. Scores for models other than Heron-NVILA-Lite and Sarashina2-Vision-14B were taken from [llm-jp-eval-mm leaderboard](https://llm-jp.github.io/llm-jp-eval-mm/) as of March 2025 and the [Asagi website](https://uehara-mech.github.io/asagi-vlm?v=1). Heron-NVILA-Lite and Sarashina2-Vision-14B were evaluated using llm-as-a-judge with "gpt-4o-2024-05-13". Sarashina2-Vision-14B was evaluated on the [official blog](https://www.sbintuitions.co.jp/blog/entry/2025/03/17/111703) using "gpt-4o-2024-08-06"; please note that due to differing evaluation conditions, the results for Sarashina2-Vision-14B should be treated as reference only.
119
 
120
  | Model | LLM Size | Heron-Bench overall LLM (%) | JA-VLM-Bench-In-the-Wild LLM (/5.0) | JA-VG-VQA-500 LLM (/5.0) |
121
  |--------------------------------|----------|------------------------------|-------------------------------------|--------------------------|
122
+ | **[Heron-NVILA-Lite-1B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-1B)** | 0.5B | 45.9 | 2.92 | 3.16 |
123
+ | **Heron-NVILA-Lite-2B** | 1.5B | 52.8 | 3.52 | 3.50 |
124
+ | **[Heron-NVILA-Lite-15B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-15B)** | 14B | 59.6 | 4.2 | 3.82 |
125
  | [LLaVA-CALM2-SigLIP](https://huggingface.co/cyberagent/llava-calm2-siglip) | 7B | 43.3 | 3.15 | 3.21 |
126
  | [Llama-3-EvoVLM-JP-v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2) | 8B | 39.3 | 2.92 | 2.96 |
127
  | [VILA-jp](https://huggingface.co/llm-jp/llm-jp-3-vila-14b) | 13B | 57.2 | 3.69 | 3.62 |