|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- THU-KEG/Crab-VerIF |
|
language: |
|
- en |
|
- zh |
|
base_model: |
|
- allenai/Llama-3.1-Tulu-3-8B-SFT |
|
pipeline_tag: text2text-generation |
|
--- |
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** Hao Peng@THUGKEG |
|
- **Model type:** RL trained LLMs |
|
- **Language(s) (NLP):** English, Chinese |
|
- **License:** apache-2.0 |
|
- **Finetuned from model [optional]:** allenai/Llama-3.1-Tulu-3-8B-SFT |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/THU-KEG/VerIF |
|
- **Paper:** https://arxiv.org/abs/2506.09942 |
|
|
|
## Training Details |
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
The model is trained using RL with VerIF, using train data [VerInstruct](https://huggingface.co/datasets/THU-KEG/VerInstruct). |
|
|
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
VerIF is a practical and efficient method for verification in instruction-following reinforcement learning. Built on the idea of Reinforcement Learning with Verifiable Rewards (RLVR), VerIF integrates rule-based code checks with LLM-based reasoning verification (e.g., QwQ-32B) to provide accurate and scalable reward signals. |
|
|
|
The model is optimized for instruction-following, without affecting other general capabilities. |
|
|
|
|
|
## Evaluation Results |
|
We evaluate the model on several representative instruction-following benchmarks, including IFEval, Multi-IF, SysBench, FollowBench, and etc.. |
|
 |
|
|
|
|
|
|
|
You can find more details in our github repo (https://github.com/THU-KEG/VerIF). |
|
If you find this model helpful, please kindly cite us: |
|
``` |
|
@misc{peng2025verif, |
|
title={VerIF: Verification Engineering for Reinforcement Learning in Instruction Following}, |
|
author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li}, |
|
year={2025}, |
|
eprint={2506.09942}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2506.09942}, |
|
} |
|
``` |