ConvLLaVA-JP Model Card

This is a pretrained checkpoint, you can use it to instruct tune your multimodal models.

Check out the instructions here

Model details

Model type: ConvLLaVA-JP is a vision-language model that can converse about input images.
This model is an LVLM model trained using laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft as the image encoder and llm-jp/llm-jp-1.3b-v1.0 as the text decoder. Supports the input of 768 x 768 high resolution images

Training dataset

LLaVA-Pretrain-JA

Acknowledgement

ConvLLaVA
LLM-jp
Open CLIP

License

Apache-2.0

toshi456
/

ConvLLaVA-JP-1.3b-768-Pretrain

ConvLLaVA-JP Model Card

Model details

Training dataset

Acknowledgement

License

Dataset used to train toshi456/ConvLLaVA-JP-1.3b-768-Pretrain