ViCToR Model Card
Model details
Paper or resources for more information: https://github.com/deepglint/Victor
Where to send questions or comments about the model: https://github.com/deepglint/Victor/issues
Results
Benchmark | ViCTOR-7B | LLaVA-1.5-13B | LLaVA-NeXT-8B | Ross |
---|---|---|---|---|
MMStar | 54.3 | 34.3 | 43.9 | 53.9 |
RealWorldQA | 65.6 | 55.3 | 58.4 | 58.7 |
MMBench^(cn,val) | 79.0 | 67.8 | – | – |
OCRBench | 556 | 337 | 531 | 553 |
POPE | 88.4 | 88.4 | 87.1 | 88.1 |
MMU | 48.9 | 37.0 | 43.1 | 49.0 |
A12D | 79.5 | 61.1 | 72.8 | 79.5 |
MME | 2071 | 1781 | 1908 | 1854 |
SEED^(f) | 75.7 | 68.2 | 72.5 | 73.6 |
Citation
@inproceedings{Xie2024ViCToRIV,
title={ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs},
author={Yin Xie and Kaicheng Yang and Peirou Liang and Xiang An and Yongle Zhao and Yumeng Wang and Ziyong Feng and Roy Miles and Ismail Elezi and Jiankang Deng},
year={2024},
url={https://api.semanticscholar.org/CorpusID:273482504}
}
- Downloads last month
- 27
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support