trl-lib/Qwen2-0.5B-Reward-Math-Sheperd
Token Classification
•
0.5B
•
Updated
•
19
•
1
trl-lib/Qwen2-0.5B-XPO
Text Generation
•
0.5B
•
Updated
•
14
•
trl-lib/Qwen2-0.5B-OnlineDPO
Text Generation
•
0.5B
•
Updated
•
13
•
trl-lib/Qwen2-0.5B-KTO
Text Generation
•
0.5B
•
Updated
•
6
trl-lib/Qwen2-0.5B-ORPO
Text Generation
•
0.5B
•
Updated
•
7
•
2
trl-lib/Qwen2-0.5B-DPO
Text Generation
•
0.5B
•
Updated
•
55
•
5
trl-lib/Qwen2-0.5B-Reward
Text Classification
•
0.5B
•
Updated
•
203
•
1
trl-lib/pythia-1b-deduped-tldr-rm
Updated
•
1.62k
trl-lib/pythia-2.8b-deduped-tldr-online-dpo
Text Generation
•
3B
•
Updated
•
11
trl-lib/pythia-6.9b-deduped-tldr-offline-dpo
Text Generation
•
7B
•
Updated
•
7
trl-lib/pythia-2.8b-deduped-tldr-offline-dpo
Text Generation
•
3B
•
Updated
•
6
trl-lib/pythia-1b-deduped-tldr-offline-dpo
Text Generation
•
1B
•
Updated
•
10
trl-lib/pythia-6.9b-deduped-tldr-rm
trl-lib/pythia-6.9b-deduped-tldr-sft
trl-lib/pythia-2.8b-deduped-tldr-rm
Updated
•
1.67k
trl-lib/pythia-2.8b-deduped-tldr-sft
Updated
•
551
trl-lib/pythia-6.9b-deduped-tldr-online-dpo
7B
•
Updated
•
4
trl-lib/pythia-1b-deduped-tldr-online-dpo
1B
•
Updated
•
5
trl-lib/pythia-1b-deduped-tldr-sft
1B
•
Updated
•
5.27k
trl-lib/qwen1.5-1.8b-dpo-cli
Updated
trl-lib/qwen1.5-0.5b-sft
Text Generation
•
0.5B
•
Updated
•
29
trl-lib/qwen1.5-1.8b-sft
Text Generation
•
2B
•
Updated
•
13
•
4
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.9-steps-800
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.8-steps-800
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.7-steps-800
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.6-steps-800
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.5-steps-800
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.4-steps-800
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.3-steps-800
trl-lib/OpenHermes-2-Mistral-7B-sigmoid-beta-0.2-steps-800