A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO.
AI & ML interests
None defined yet.
models
80
trl-lib/Qwen2-0.5B-XPO
Text Generation
•
Updated
•
14
trl-lib/Qwen2-0.5B-OnlineDPO
Text Generation
•
Updated
•
10
trl-lib/Qwen2-0.5B-KTO
Text Generation
•
Updated
•
29
trl-lib/Qwen2-0.5B-ORPO
Text Generation
•
Updated
•
30
•
1
trl-lib/Qwen2-0.5B-DPO
Text Generation
•
Updated
•
181
•
3
trl-lib/Qwen2-0.5B-Reward
Text Classification
•
Updated
•
160
trl-lib/pythia-1b-deduped-tldr-rm
Updated
•
2.33k
trl-lib/pythia-2.8b-deduped-tldr-online-dpo
Text Generation
•
Updated
•
43
trl-lib/pythia-6.9b-deduped-tldr-offline-dpo
Text Generation
•
Updated
•
15
trl-lib/pythia-2.8b-deduped-tldr-offline-dpo
Text Generation
•
Updated
•
15
datasets
14
trl-lib/rlaif-v
Viewer
•
Updated
•
83.1k
•
71
•
1
trl-lib/Capybara-Preferences
Viewer
•
Updated
•
15.4k
•
153
trl-lib/Capybara
Viewer
•
Updated
•
16k
•
939
trl-lib/ultrafeedback-prompt
Viewer
•
Updated
•
39.8k
•
626
•
1
trl-lib/tldr
Viewer
•
Updated
•
130k
•
1.45k
trl-lib/ultrafeedback_binarized
Viewer
•
Updated
•
63.1k
•
4.28k
•
3
trl-lib/lm-human-preferences-sentiment
Viewer
•
Updated
•
6.26k
•
79
trl-lib/lm-human-preferences-descriptiveness
Viewer
•
Updated
•
6.26k
•
37
trl-lib/tldr-preference
Viewer
•
Updated
•
179k
•
74
trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness
Viewer
•
Updated
•
16.6k
•
71