--- license: cc-by-nc-4.0 datasets: - jinzhuoran/OmniRewardData base_model: - openbmb/MiniCPM-o-2_6 --- # Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

🤗 Benchmark | 🤗 Dataset | 🤗 Model | 🏠 Homepage

## 🧩 Overview **OmniRewardModel** is our pretrained **discriminative reward model** designed to handle *omni-modal* tasks (e.g., text, image, video) and *free-form human preferences*. It is built upon the open-source base model [MiniCPM-o-2_6](https://huggingface.co/openbmb/MiniCPM-o-2_6), with an additional **value head** appended to produce scalar reward scores. The model supports fine-grained scoring across various tasks and modalities, and can be seamlessly loaded via Hugging Face Hub. --- ## 🛠️ Environment Setup To reproduce the training process in our paper, please make sure to set up the environment as described below. Our training code is built upon the [llama-factory](https://github.com/hiyouga/llama-factory) framework. ```bash git clone https://github.com/HongbangYuan/OmniReward.git conda create -n omnireward python=3.10 conda activate omnireward ``` We recommend using **`torch==2.2.0`** for best compatibility. Install PyTorch (choose one based on your CUDA version): ```bash # For CUDA 11.8: pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \ --index-url https://download.pytorch.org/whl/cu118 # For CUDA 12.1: pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 \ --index-url https://download.pytorch.org/whl/cu121 ``` Then install the remaining dependencies: ```bash cd OmniReward/OmniReward-Factory pip install -r requirements.txt ``` ## 📦 Data Preparation Download all required training and evaluation datasets from [OmniRewardData](https://huggingface.co/datasets/jinzhuoran/OmniRewardData) and [OmniRewardBench](https://huggingface.co/datasets/HongbangYuan/OmniRewardBench): ```bash cd OmniReward-Factory bash scripts/download.sh ``` ## 🏋️‍♀️ Training Omni-Reward To reproduce the training results described in our paper, please navigate to the OmniReward-Factory directory and run the following scripts: ```bash cd OmniReward-Factory bash scripts/train.sh bash scripts/train_t2t.sh bash scripts/train_ti2t.sh bash scripts/train_t2iv.sh ``` ## 📈 Loading and Evaluating Omni-Reward You can also directly use our pretrained Omni-Reward for evaluation without retraining. The models are publicly available at: 👉 https://huggingface.co/jinzhuoran/OmniRewardModel ```bash cd OmniReward-Factory bash scripts/eval_t2t.sh bash scripts/eval_t2t_tie.sh bash scripts/eval_ti2t.sh bash scripts/eval_ti2t_tie.sh ``` - `--eval_dataset`: Specifies the evaluation dataset (e.g., `omni_t2t`, `omni_t2i`, `omni_t2v`, etc.). - `--eval_tie`: Enables w/ Ties evaluation.