Reward model - a transZ Collection

transZ 's Collections

Reward model

updated about 12 hours ago

Reward modelling

RLHFlow/SHP-standard

Viewer • Updated May 9, 2024 • 93.3k • 13

Note Training
transZ/shp

Viewer • Updated 16 days ago • 10.3k • 16

Note Test and validation
RLHFlow/HH-RLHF-Helpful-standard

Viewer • Updated Apr 27, 2024 • 115k • 308 • 3

Note Training
transZ/anthropic_helpful_test

Viewer • Updated 16 days ago • 2.33k • 18

Note Test
RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard

Viewer • Updated May 8, 2024 • 42.3k • 25 • 4

Note Training
transZ/anthropic_harmless_test

Viewer • Updated 16 days ago • 2.3k • 18

Note Test
transZ/helpsteer3

Updated about 16 hours ago

Note Training and testing
RLHFlow/PKU-SafeRLHF-30K-standard

Viewer • Updated Apr 29, 2024 • 26.9k • 4 • 3

Note Training
transZ/pku_safe_rlhf

Updated about 16 hours ago

Note Test
HuggingFaceH4/cai-conversation-harmless

Viewer • Updated Feb 2, 2024 • 44.8k • 211 • 16

Note Training and testing