The result is not good as Siglip

by lucasjin - opened Jun 5, 2024

Jun 5, 2024

Using llava to finetune, the result to wrose than siglip, this is unexpected, what's more, it actually can not get any Chinese OCR ability even with Chinese textvqa data.
Why.

czczup

OpenGVLab org Aug 22, 2024

Hello, thank you for your feedback. It might require a larger amount of data to demonstrate its advantages over SigLIP, as the few hundred thousand samples in LLaVA may be insufficient. Additionally, although ViT has learned to extract features of Chinese characters, performing well in Chinese OCR still requires the use of a large Chinese OCR dataset during the SFT stage.

czczup changed discussion status to closed Aug 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment