turing-motors
/

Heron-NVILA-Lite-2B

@@ -14,9 +14,9 @@ base_model:
 - Efficient-Large-Model/paligemma-siglip-so400m-patch14-448
 pipeline_tag: image-text-to-text
 ---
-# Heron NVILA-Lite 2B
-Heron NVILA-Lite 2B is a vision language model trained for Japanese, based on the [NVILA](https://arxiv.org/abs/2412.04468)-Lite architecture.
 ## Model Overview
@@ -115,13 +115,13 @@ print("---" * 40)
 ## Evaluation
-I used [llm-jp-eval-mm](https://github.com/llm-jp/llm-jp-eval-mm) for this evaluation. Scores for models other than Heron NVILA-Lite and Sarashina2-Vision-14B were taken from [llm-jp-eval-mm leaderboard](https://llm-jp.github.io/llm-jp-eval-mm/) as of March 2025 and the [Asagi website](https://uehara-mech.github.io/asagi-vlm?v=1). Heron NVILA-Lite and Sarashina2-Vision-14B were evaluated using llm-as-a-judge with "gpt-4o-2024-05-13". Sarashina2-Vision-14B was evaluated on the [official blog](https://www.sbintuitions.co.jp/blog/entry/2025/03/17/111703) using "gpt-4o-2024-08-06"; please note that due to differing evaluation conditions, the results for Sarashina2-Vision-14B should be treated as reference only.
 | Model                          | LLM Size | Heron-Bench overall LLM (%) | JA-VLM-Bench-In-the-Wild LLM (/5.0) | JA-VG-VQA-500 LLM (/5.0) |
 |--------------------------------|----------|------------------------------|-------------------------------------|--------------------------|
-| **[Heron NVILA-Lite 1B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-1B)**        | 0.5B     | 45.9                         | 2.92                                | 3.16                     |
-| **Heron NVILA-Lite 2B**        | 1.5B     | 52.8                         | 3.52                                | 3.50                     |
-| **[Heron NVILA-Lite 15B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-15B)**       | 14B      | 59.6                         | 4.2                                 | 3.82                     |
 | [LLaVA-CALM2-SigLIP](https://huggingface.co/cyberagent/llava-calm2-siglip)             | 7B      | 43.3                        | 3.15                                | 3.21                     |
 | [Llama-3-EvoVLM-JP-v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2)           | 8B      | 39.3                        | 2.92                                | 2.96                     |
 | [VILA-jp](https://huggingface.co/llm-jp/llm-jp-3-vila-14b)                        | 13B     | 57.2                        | 3.69                                | 3.62                     |

 - Efficient-Large-Model/paligemma-siglip-so400m-patch14-448
 pipeline_tag: image-text-to-text
 ---
+# Heron-NVILA-Lite-2B
+Heron-NVILA-Lite-2B is a vision language model trained for Japanese, based on the [NVILA](https://arxiv.org/abs/2412.04468)-Lite architecture.
 ## Model Overview
 ## Evaluation
+I used [llm-jp-eval-mm](https://github.com/llm-jp/llm-jp-eval-mm) for this evaluation. Scores for models other than Heron-NVILA-Lite and Sarashina2-Vision-14B were taken from [llm-jp-eval-mm leaderboard](https://llm-jp.github.io/llm-jp-eval-mm/) as of March 2025 and the [Asagi website](https://uehara-mech.github.io/asagi-vlm?v=1). Heron-NVILA-Lite and Sarashina2-Vision-14B were evaluated using llm-as-a-judge with "gpt-4o-2024-05-13". Sarashina2-Vision-14B was evaluated on the [official blog](https://www.sbintuitions.co.jp/blog/entry/2025/03/17/111703) using "gpt-4o-2024-08-06"; please note that due to differing evaluation conditions, the results for Sarashina2-Vision-14B should be treated as reference only.
 | Model                          | LLM Size | Heron-Bench overall LLM (%) | JA-VLM-Bench-In-the-Wild LLM (/5.0) | JA-VG-VQA-500 LLM (/5.0) |
 |--------------------------------|----------|------------------------------|-------------------------------------|--------------------------|
+| **[Heron-NVILA-Lite-1B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-1B)**        | 0.5B     | 45.9                         | 2.92                                | 3.16                     |
+| **Heron-NVILA-Lite-2B**        | 1.5B     | 52.8                         | 3.52                                | 3.50                     |
+| **[Heron-NVILA-Lite-15B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-15B)**       | 14B      | 59.6                         | 4.2                                 | 3.82                     |
 | [LLaVA-CALM2-SigLIP](https://huggingface.co/cyberagent/llava-calm2-siglip)             | 7B      | 43.3                        | 3.15                                | 3.21                     |
 | [Llama-3-EvoVLM-JP-v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2)           | 8B      | 39.3                        | 2.92                                | 2.96                     |
 | [VILA-jp](https://huggingface.co/llm-jp/llm-jp-3-vila-14b)                        | 13B     | 57.2                        | 3.69                                | 3.62                     |