zhezi12138
/

alpaca-7b-iter-3-mixp

Model card Files Files and versions Community

alpaca-7b-iter-3-mixp / README.md

zhezi12138's picture

Create README.md

574b181 verified 22 days ago

|

history blame contribute delete

374 Bytes

	---
	license: mit
	datasets:
	- PKU-Alignment/BeaverTails
	language:
	- en
	base_model:
	- PKU-Alignment/alpaca-7b-reproduced
	---
	This model is for the reproduction of results on Safe-RLHF dataset of paper "The crucial role of samplers in online direct preference optimization". Iteration 3 of DPO-mixp algorithm, trained on https://huggingface.co/zhezi12138/alpaca-7b-iter-2-mixp.