ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K

image/jpeg

Stable Diffusion XL "A capybara, a killer whale, and a robot named Ultra being friends"

This is an ORPO fine-tune of mistralai/Mistral-7B-v0.1 with alvarobartt/dpo-mix-7k-simplified.

⚠️ Note that the code is still experimental, as the ORPOTrainer PR is still not merged, follow its progress at πŸ€—trl - ORPOTrainer PR.

Reference

ORPO: Monolithic Preference Optimization without Reference Model

Downloads last month
20
Safetensors
Model size
7.24B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for alvarobartt/mistral-orpo-mix

Finetuned
(931)
this model
Quantizations
2 models

Dataset used to train alvarobartt/mistral-orpo-mix

Space using alvarobartt/mistral-orpo-mix 1

Collection including alvarobartt/mistral-orpo-mix