xintongzhang
/

CoF-sft-model-7b

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions Community

CoF-SFT-VL Model

This is a supervised fine-tuned (SFT) vision-language model based on Qwen/Qwen2.5-VL-7B-Instruct. It is trained on the CoF-SFT-Data-5.4k dataset, which contains 5.4k image-text reasoning examples.

Model Details

Base model: Qwen/Qwen2.5-VL-7B-Instruct
Training data: 5.4k curated reasoning samples from xintongzhang/CoF-SFT-Data-5.4k
Framework: Transformers

Resources

Project page: https://cof-reasoning.github.io/
Paper: https://arxiv.org/abs/2505.15436

Downloads last month: 14

Safetensors

Model size

8.29B params

Tensor type

BF16

·

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xintongzhang/CoF-sft-model-7b

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(465)

this model

Quantizations

Dataset used to train xintongzhang/CoF-sft-model-7b