ConvLLaVA-JP Model Card

This is a pretrained checkpoint, you can use it to instruct tune your multimodal models.

Check out the instructions here

Model details

Model type: ConvLLaVA-JP is a vision-language model that can converse about input images.
This model is an LVLM model trained using laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft as the image encoder and llm-jp/llm-jp-1.3b-v1.0 as the text decoder. Supports the input of 768 x 768 high resolution images

Training dataset

Acknowledgement

License

Apache-2.0

Downloads last month
51
Safetensors
Model size
2.1B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Dataset used to train toshi456/ConvLLaVA-JP-1.3b-768-Pretrain