--- license: mit datasets: - TIGER-Lab/ViRL39K base_model: - Qwen/Qwen2.5-VL-7B-Instruct ---

logo

# Spark-VL-7B ⭐ If you find our code or model helpful, please consider giving us a star — your support means a lot! 🏠Github repository 📖Daily Paper 🤗models 📖Paper ## Introduction We propose **SPARK**, **a unified framework that integrates policy and reward into a single model for joint and synchronous training**. SPARK can automatically derive reward and reflection data from verifiable reward, enabling **self-learning** and **self-evolution**. Furthermore, we instantiate this framework on multiple backbones, training SPARK-VL-7B, SPARK-7B, and SPARK-VL-32B. This repo is the **SPARK-VL-7B**. ## 📢 News - 🚀 [09/29/2025] We release our 🤗datasets. - 🚀 [09/29/2025] We release our **Spark's** 📖Paper. - 🚀 [09/29/2025] We upload our evaluation code and 🤗models. - 🚀 [09/29/2025] We release **Spark** 🏠Github repository. ## 💡 Highlights - 🔥 **Synergistic Policy–Reward Co-Evolving (SPARK)**: We introduce SPARK, a unified reinforcement fine-tuning framework that jointly optimizes policy and reward within a single model through on-policy co-evolution.. - 🔥 **Recycling Rollouts**: Unlike conventional RL pipelines that discard rollouts after policy updates, SPARK recycles RLVR rollouts into pointwise, pairwise, and reflection objectives, enabling the model itself to act as both a strong policy and a generative reward model. - 🔥 **Co-Evolving Mechanism**: Improved reward accuracy provides better gradients for policy learning, while stronger reasoning further refines reward judgment, forming a positive feedback loop that enhances reasoning, judgment, and reflection in synergy. - 🔥 **Efficient and Practical**: SPARK requires no human preference data, teacher models, or external reward models, making it significantly more data- and compute-efficient than traditional RM-based RL pipelines. ## 🛠️ Usage ### 🤗 Using Transformers Our model is based on Qwen2.5-VL-7B-Instruct. You can use the same code as the Qwen2.5-VL-7B-Instruct model for inference, referring to 🤗Huggingface. ```python from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor from qwen_vl_utils import process_vision_info model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "internlm/Spark-VL-7B", torch_dtype=torch.bfloat16, attn_implementation="flash_attention_2", device_map="auto", ) processor = AutoProcessor.from_pretrained("internlm/Spark-VL-7B") messages = [ { "role": "user", "content": [ { "type": "image", "image": image_path, }, {"type": "text", "text": prompt}, ], } ] # Preparation for inference text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ) inputs = inputs.to("cuda") # Inference: Generation of the output generated_ids = model.generate(**inputs, max_new_tokens=128) generated_ids_trimmed = [ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids) ] output_text = processor.batch_decode( generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False ) print(output_text) ``` ### 🔦 Using vLLM We recommend using **vLLM** for faster inference speed. Using vLLM leads to significant speed improvements in dataset evaluation. ```bash PORT=8019 N_PROC=256 SERVE_NAME=spark_vl_7b MODEL_PATH=/internlm/Spark-VL-7B CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve "$MODEL_PATH" \ --tensor-parallel-size 4 \ --served-model-name $SERVE_NAME \ --port $PORT \ --max-num-seqs $N_PROC ``` ## ✒️Citation ``` TBD ```