ViCToR Model Card

Model details

Paper or resources for more information: https://github.com/deepglint/Victor

Where to send questions or comments about the model: https://github.com/deepglint/Victor/issues

Results

Benchmark	ViCTOR-7B	LLaVA-1.5-13B	LLaVA-NeXT-8B	Ross
MMStar	54.3	34.3	43.9	53.9
RealWorldQA	65.6	55.3	58.4	58.7
MMBench^(cn,val)	79.0	67.8	–	–
OCRBench	556	337	531	553
POPE	88.4	88.4	87.1	88.1
MMU	48.9	37.0	43.1	49.0
A12D	79.5	61.1	72.8	79.5
MME	2071	1781	1908	1854
SEED^(f)	75.7	68.2	72.5	73.6

Citation

@inproceedings{Xie2024ViCToRIV,
  title={ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs},
  author={Yin Xie and Kaicheng Yang and Peirou Liang and Xiang An and Yongle Zhao and Yumeng Wang and Ziyong Feng and Roy Miles and Ismail Elezi and Jiankang Deng},
  year={2024},
  url={https://api.semanticscholar.org/CorpusID:273482504}
}

DeepGlint-AI
/

ViCToR-LLaVA-SigLIP2-Qwen2.5-7b

ViCToR Model Card

Model details

Results

Citation

Datasets used to train DeepGlint-AI/ViCToR-LLaVA-SigLIP2-Qwen2.5-7b