Spark-VL-7B / README.md
yuhangzang's picture
Improve model card: Add pipeline tag, library name, tags, and citation (#1)
113064a verified
metadata
base_model:
  - Qwen/Qwen2.5-VL-7B-Instruct
datasets:
  - TIGER-Lab/ViRL39K
license: mit
library_name: transformers
pipeline_tag: video-text-to-text
tags:
  - lvlm
  - reasoning
  - multimodal
  - qwen

logo

Spark-VL-7B

โญ If you find our code or model helpful, please consider giving us a star โ€” your support means a lot! ๐Ÿ Github repository ๐Ÿ“–Daily Paper ๐Ÿค—models ๐Ÿ“–Paper

Introduction

We propose SPARK, a unified framework that integrates policy and reward into a single model for joint and synchronous training. SPARK can automatically derive reward and reflection data from verifiable reward, enabling self-learning and self-evolution. Furthermore, we instantiate this framework on multiple backbones, training SPARK-VL-7B, SPARK-7B, and SPARK-VL-32B. This repo is the SPARK-VL-7B.

๐Ÿ“ข News

  • ๐Ÿš€ [09/29/2025] We release our ๐Ÿค—datasets.
  • ๐Ÿš€ [09/29/2025] We release our Spark's ๐Ÿ“–Paper.
  • ๐Ÿš€ [09/29/2025] We upload our evaluation code and ๐Ÿค—models.
  • ๐Ÿš€ [09/29/2025] We release Spark ๐Ÿ Github repository.

๐Ÿ’ก Highlights

  • ๐Ÿ”ฅ Synergistic Policyโ€“Reward Co-Evolving (SPARK): We introduce SPARK, a unified reinforcement fine-tuning framework that jointly optimizes policy and reward within a single model through on-policy co-evolution..
  • ๐Ÿ”ฅ Recycling Rollouts: Unlike conventional RL pipelines that discard rollouts after policy updates, SPARK recycles RLVR rollouts into pointwise, pairwise, and reflection objectives, enabling the model itself to act as both a strong policy and a generative reward model.
  • ๐Ÿ”ฅ Co-Evolving Mechanism: Improved reward accuracy provides better gradients for policy learning, while stronger reasoning further refines reward judgment, forming a positive feedback loop that enhances reasoning, judgment, and reflection in synergy.
  • ๐Ÿ”ฅ Efficient and Practical: SPARK requires no human preference data, teacher models, or external reward models, making it significantly more data- and compute-efficient than traditional RM-based RL pipelines.

๐Ÿ› ๏ธ Usage

๐Ÿค— Using Transformers

Our model is based on Qwen2.5-VL-7B-Instruct. You can use the same code as the Qwen2.5-VL-7B-Instruct model for inference, referring to ๐Ÿค—Huggingface.

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "internlm/Spark-VL-7B",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)

processor = AutoProcessor.from_pretrained("internlm/Spark-VL-7B")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image_path,
            },
            {"type": "text", "text": prompt},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

๐Ÿ”ฆ Using vLLM

We recommend using vLLM for faster inference speed. Using vLLM leads to significant speed improvements in dataset evaluation.

PORT=8019
N_PROC=256
SERVE_NAME=spark_vl_7b
MODEL_PATH=/internlm/Spark-VL-7B

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "$MODEL_PATH" \
  --tensor-parallel-size 4 \
  --served-model-name $SERVE_NAME \
  --port $PORT \
  --max-num-seqs $N_PROC

Training

Spark Training

After downloading the dataset, you can start training using the following example bash script. Our bash scripts are in /Spark/Lmm_XC/XC/scripts/spark_training You need to modify the dataset paths and model paths to your own locations.

export WORKSPACE_DIR="/fs-computility/....../Lmm_XC"                 # Path to project root directory
export DATASET_PATH="/fs-computility/....../infer_data_ViRL_19k.json"            # Path to your dataset
export PRETRAIN_MODEL_PATH="/fs-computility/....../Qwen2.5-VL-7B-Instruct"  # Path to pretrained model
export WANDB_PROJECT="Observation"        # Name for this project
export MODEL_CPK_NAME="Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2"         # Name for this training run
export LOG_PATH='/fs-computility/....../Qwen2.5-VL-7B-GRPO-virl-19k-iar-reflection-hyb-diverse-bs64-e2.txt'      #Log file save path


export WANDB_API_KEY="......"
export SAVE_PATH="/fs-computility/....../${WANDB_PROJECT}/${MODEL_CPK_NAME}"                   # Absolute path to save everything about this training run
export CKPT_PATH="${SAVE_PATH}/ckpt"                                                                    # Path to save checkpoints                                    
export FINAL_CKPT_PATH="${SAVE_PATH}/final_ckpt"                                                        # Path to save final checkpoints
export TIMESTAMP=$(date +%Y%m%d_%H%M%S)                                                                 # Timestamp
export CUR_LOG_DIR="${SAVE_PATH}/training_logs/${TIMESTAMP}"                                            # Path to save current run logs
export LOG_DIR="${SAVE_PATH}/tb_logs"  

โฐ Attention:

export DEV_MODE=0 # Set to 1 for debug mode on single dev machine

Evaluation

The integrated multimodal mathematics dataset can be downloaded from ๐Ÿค—datasets and evaluated using the scripts provided in the Evaluation folder. The evaluation results will be stored, and accuracy can subsequently be computed with the calculate_acc.py file.

bash ./Evaluation/eval_spark_vl_7b.sh
python calculate_acc.py --result_path ./your_result_path.json

โœ’๏ธCitation

@article{liu2025spark,
  title={SPARK: Synergistic Policy And Reward Co-Evolving Framework},
  author={Ziyu Liu and Yuhang Zang and Shengyuan Ding and Yuhang Cao and Xiaoyi Dong and Haodong Duan and Dahua Lin and Jiaqi Wang},
  journal={arXiv preprint arXiv:2509.22624},
  year={2025}
}

๐Ÿ“„ License

Code License Data License Usage and License Notices: The data and code are intended and licensed for research use only. License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use

Acknowledgement

We sincerely thank projects lmm-r1 and OpenRLHF for providing their open-source resources.