--- license: apache-2.0 base_model: Qwen/Qwen2.5-VL-3B-Instruct tags: - reward-model - rfm - vision-language - multimodal library_name: transformers --- # aliangdw/rfm_prefprog_v3 This is a Reward Function Model (RFM) for vision-language preference learning and similarity assessment. ## Model Details - **Base Model**: Qwen/Qwen2.5-VL-3B-Instruct - **Model Type**: qwen2_5_vl - **Architecture**: RFMModel - **Task**: Vision-Language Reward Modeling - **Training Method**: FSDP (Fully Sharded Data Parallel) ## Usage ```python from transformers import AutoProcessor, AutoModel import torch # Load model and processor processor = AutoProcessor.from_pretrained("aliangdw/rfm_prefprog_v3", trust_remote_code=True) model = AutoModel.from_pretrained("aliangdw/rfm_prefprog_v3", trust_remote_code=True) # Example usage for preference scoring # inputs = processor(images=images, text=text, return_tensors="pt") # outputs = model(**inputs, sample_type="preference") ``` ## Model Capabilities This RFM model can perform: 1. **Preference Prediction**: Given two trajectories A and B, predict which one is preferred 2. **Similarity Assessment**: Evaluate how similar a trajectory is to a reference 3. **Progress Estimation**: Estimate task completion progress ## Training The model was trained using: - FSDP for distributed training - Mixed precision (bfloat16) - Custom loss functions for preference and similarity learning ## Files This repository contains: - Model weights in SafeTensors format - Configuration files - Tokenizer/Processor files ## Citation If you use this model, please cite: