PharMolix/Mol-VL-7B · Hugging Face

Mol-VL is a Vision-Language Model for Optical Chemical Structure Understanding (OCSU).

To take advantage of existing pretrained VLMs, we adopt the weights from Qwen2-VL. Mol-VL-7B is further finetuned on Vis-CheBI20 training set.

For technical details, please refer to OCSU. Training and evaluation scripts will be available recently, stay tuned!

If you find our work useful in your research, please consider citing:

@article{fan2025ocsu,
  title={OCSU: Optical Chemical Structure Understanding for Molecule-centric Scientific Discovery},
  author={Fan, Siqi and Xie, Yuguang and Cai, Bowen and Xie, Ailin and Liu, Gaochao and Qiao, Mu and Xing, Jie and Nie, Zaiqing},
  journal={arXiv preprint arXiv:2501.15415},
  year={2025}
}