Update README.md
Browse files
README.md
CHANGED
@@ -14,9 +14,9 @@ base_model:
|
|
14 |
- Efficient-Large-Model/paligemma-siglip-so400m-patch14-448
|
15 |
pipeline_tag: image-text-to-text
|
16 |
---
|
17 |
-
# Heron
|
18 |
|
19 |
-
Heron
|
20 |
|
21 |
## Model Overview
|
22 |
|
@@ -115,13 +115,13 @@ print("---" * 40)
|
|
115 |
|
116 |
## Evaluation
|
117 |
|
118 |
-
I used [llm-jp-eval-mm](https://github.com/llm-jp/llm-jp-eval-mm) for this evaluation. Scores for models other than Heron
|
119 |
|
120 |
| Model | LLM Size | Heron-Bench overall LLM (%) | JA-VLM-Bench-In-the-Wild LLM (/5.0) | JA-VG-VQA-500 LLM (/5.0) |
|
121 |
|--------------------------------|----------|------------------------------|-------------------------------------|--------------------------|
|
122 |
-
| **[Heron
|
123 |
-
| **Heron
|
124 |
-
| **[Heron
|
125 |
| [LLaVA-CALM2-SigLIP](https://huggingface.co/cyberagent/llava-calm2-siglip) | 7B | 43.3 | 3.15 | 3.21 |
|
126 |
| [Llama-3-EvoVLM-JP-v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2) | 8B | 39.3 | 2.92 | 2.96 |
|
127 |
| [VILA-jp](https://huggingface.co/llm-jp/llm-jp-3-vila-14b) | 13B | 57.2 | 3.69 | 3.62 |
|
|
|
14 |
- Efficient-Large-Model/paligemma-siglip-so400m-patch14-448
|
15 |
pipeline_tag: image-text-to-text
|
16 |
---
|
17 |
+
# Heron-NVILA-Lite-2B
|
18 |
|
19 |
+
Heron-NVILA-Lite-2B is a vision language model trained for Japanese, based on the [NVILA](https://arxiv.org/abs/2412.04468)-Lite architecture.
|
20 |
|
21 |
## Model Overview
|
22 |
|
|
|
115 |
|
116 |
## Evaluation
|
117 |
|
118 |
+
I used [llm-jp-eval-mm](https://github.com/llm-jp/llm-jp-eval-mm) for this evaluation. Scores for models other than Heron-NVILA-Lite and Sarashina2-Vision-14B were taken from [llm-jp-eval-mm leaderboard](https://llm-jp.github.io/llm-jp-eval-mm/) as of March 2025 and the [Asagi website](https://uehara-mech.github.io/asagi-vlm?v=1). Heron-NVILA-Lite and Sarashina2-Vision-14B were evaluated using llm-as-a-judge with "gpt-4o-2024-05-13". Sarashina2-Vision-14B was evaluated on the [official blog](https://www.sbintuitions.co.jp/blog/entry/2025/03/17/111703) using "gpt-4o-2024-08-06"; please note that due to differing evaluation conditions, the results for Sarashina2-Vision-14B should be treated as reference only.
|
119 |
|
120 |
| Model | LLM Size | Heron-Bench overall LLM (%) | JA-VLM-Bench-In-the-Wild LLM (/5.0) | JA-VG-VQA-500 LLM (/5.0) |
|
121 |
|--------------------------------|----------|------------------------------|-------------------------------------|--------------------------|
|
122 |
+
| **[Heron-NVILA-Lite-1B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-1B)** | 0.5B | 45.9 | 2.92 | 3.16 |
|
123 |
+
| **Heron-NVILA-Lite-2B** | 1.5B | 52.8 | 3.52 | 3.50 |
|
124 |
+
| **[Heron-NVILA-Lite-15B](https://huggingface.co/turing-motors/Heron-NVILA-Lite-15B)** | 14B | 59.6 | 4.2 | 3.82 |
|
125 |
| [LLaVA-CALM2-SigLIP](https://huggingface.co/cyberagent/llava-calm2-siglip) | 7B | 43.3 | 3.15 | 3.21 |
|
126 |
| [Llama-3-EvoVLM-JP-v2](https://huggingface.co/SakanaAI/Llama-3-EvoVLM-JP-v2) | 8B | 39.3 | 2.92 | 2.96 |
|
127 |
| [VILA-jp](https://huggingface.co/llm-jp/llm-jp-3-vila-14b) | 13B | 57.2 | 3.69 | 3.62 |
|