THU-KEG
/

TULU3-VerIF

Text Generation

text2text-generation

Model card Files Files and versions

TULU3-VerIF / README.md

Wesleythu's picture

Update README.md

c7ee4b6 verified 3 months ago

|

history blame contribute delete

2.37 kB

	---
	license: apache-2.0
	datasets:
	- THU-KEG/Crab-VerIF
	language:
	- en
	- zh
	base_model:
	- allenai/Llama-3.1-Tulu-3-8B-SFT
	pipeline_tag: text2text-generation
	---
	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: Hao Peng@THUGKEG
	- Model type: RL trained LLMs
	- Language(s) (NLP): English, Chinese
	- License: apache-2.0
	- Finetuned from model [optional]: allenai/Llama-3.1-Tulu-3-8B-SFT

	### Model Sources [optional]

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/THU-KEG/VerIF
	- Paper: https://arxiv.org/abs/2506.09942

	## Training Details

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	The model is trained using RL with VerIF, using train data [VerInstruct](https://huggingface.co/datasets/THU-KEG/VerInstruct).


	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	VerIF is a practical and efficient method for verification in instruction-following reinforcement learning. Built on the idea of Reinforcement Learning with Verifiable Rewards (RLVR), VerIF integrates rule-based code checks with LLM-based reasoning verification (e.g., QwQ-32B) to provide accurate and scalable reward signals.

	The model is optimized for instruction-following, without affecting other general capabilities.


	## Evaluation Results
	We evaluate the model on several representative instruction-following benchmarks, including IFEval, Multi-IF, SysBench, FollowBench, and etc..
	![Results](./results.png)



	You can find more details in our github repo (https://github.com/THU-KEG/VerIF).
	If you find this model helpful, please kindly cite us:
	```
	@misc{peng2025verif,
	title={VerIF: Verification Engineering for Reinforcement Learning in Instruction Following},
	author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li},
	year={2025},
	eprint={2506.09942},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2506.09942},
	}
	```