Update README.md
Browse files
README.md
CHANGED
@@ -109,7 +109,7 @@ print("---" * 40)
|
|
109 |
| Stage | Training | Data Sources | Samples |
|
110 |
|--------|-------------------------------|-------------------------------|-------------|
|
111 |
| Stage1 | Projector | [Japanese image text pairs](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-japanese-image-text-pairs), [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | 1.1M |
|
112 |
-
| Stage2 | Projector, LLM | Filtered
|
113 |
| | | [Japanese image text pairs (subset)](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-japanese-image-text-pairs), [Japanese interleaved data (subset)](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-japanese-interleaved-data), [mmc4-core (subset)](https://github.com/allenai/mmc4), [coyo-700m (subset)](https://huggingface.co/datasets/kakaobrain/coyo-700m), [wikipedia_ja](https://huggingface.co/datasets/turing-motors/Wikipedia-Vision-JA), [llava_pretrain_ja](https://huggingface.co/datasets/turing-motors/LLaVA-Pretrain-JA), [stair_captions](http://captions.stair.center/) | 20M |
|
114 |
| Stage3 | Vision Encoder, Projector, LLM | [llava-instruct-v1_5-en-subset-358k](https://huggingface.co/datasets/llm-jp/llava-instruct-v1_5-en-subset-358k), [llava-instruct-ja](https://huggingface.co/datasets/llm-jp/llava-instruct-ja), [japanese-photos-conv](https://huggingface.co/datasets/llm-jp/japanese-photos-conversation), [ja-vg-vqa](https://huggingface.co/datasets/llm-jp/ja-vg-vqa-conversation), [synthdog-ja (subset)](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja), [ai2d](https://huggingface.co/datasets/lmms-lab/ai2d), [synthdog-en](https://huggingface.co/datasets/naver-clova-ix/synthdog-en), [sherlock](https://github.com/allenai/sherlock) | 1.4M |
|
115 |
|
|
|
109 |
| Stage | Training | Data Sources | Samples |
|
110 |
|--------|-------------------------------|-------------------------------|-------------|
|
111 |
| Stage1 | Projector | [Japanese image text pairs](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-japanese-image-text-pairs), [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | 1.1M |
|
112 |
+
| Stage2 | Projector, LLM | Filtered [MOMIJI](https://huggingface.co/datasets/turing-motors/MOMIJI) (CC-MAIN-2024-46, CC-MAIN-2024-51, CC-MAIN-2025-05) | 13M |
|
113 |
| | | [Japanese image text pairs (subset)](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-japanese-image-text-pairs), [Japanese interleaved data (subset)](https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-japanese-interleaved-data), [mmc4-core (subset)](https://github.com/allenai/mmc4), [coyo-700m (subset)](https://huggingface.co/datasets/kakaobrain/coyo-700m), [wikipedia_ja](https://huggingface.co/datasets/turing-motors/Wikipedia-Vision-JA), [llava_pretrain_ja](https://huggingface.co/datasets/turing-motors/LLaVA-Pretrain-JA), [stair_captions](http://captions.stair.center/) | 20M |
|
114 |
| Stage3 | Vision Encoder, Projector, LLM | [llava-instruct-v1_5-en-subset-358k](https://huggingface.co/datasets/llm-jp/llava-instruct-v1_5-en-subset-358k), [llava-instruct-ja](https://huggingface.co/datasets/llm-jp/llava-instruct-ja), [japanese-photos-conv](https://huggingface.co/datasets/llm-jp/japanese-photos-conversation), [ja-vg-vqa](https://huggingface.co/datasets/llm-jp/ja-vg-vqa-conversation), [synthdog-ja (subset)](https://huggingface.co/datasets/naver-clova-ix/synthdog-ja), [ai2d](https://huggingface.co/datasets/lmms-lab/ai2d), [synthdog-en](https://huggingface.co/datasets/naver-clova-ix/synthdog-en), [sherlock](https://github.com/allenai/sherlock) | 1.4M |
|
115 |
|