The result is not good as Siglip

#1
by lucasjin - opened

Using llava to finetune, the result to wrose than siglip, this is unexpected, what's more, it actually can not get any Chinese OCR ability even with Chinese textvqa data.
Why.

OpenGVLab org

Hello, thank you for your feedback. It might require a larger amount of data to demonstrate its advantages over SigLIP, as the few hundred thousand samples in LLaVA may be insufficient. Additionally, although ViT has learned to extract features of Chinese characters, performing well in Chinese OCR still requires the use of a large Chinese OCR dataset during the SFT stage.

czczup changed discussion status to closed

Sign up or log in to comment