Project of MoE reward model

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

zyhang1998 submitted a paper 28 days ago

OmniOPD: Logit-Free On-Policy Distillation via Speculative Verification

zyhang1998 authored a paper 3 months ago

Synthetic Sandbox for Training Machine Learning Engineering Agents

zhuokai authored a paper 3 months ago

Synthetic Sandbox for Training Machine Learning Engineering Agents

View all activity

models 6

MoeReward/rl_checkpoints

Updated Jun 27, 2025

MoeReward/lora_checkpoint

Updated Mar 30, 2025

MoeReward/reward_lora_qwen_1_5_base

Updated Mar 21, 2025 • 3

MoeReward/reward_qwen_1_5

14B • Updated Mar 17, 2025 • 2

MoeReward/reward_lora_qwen_1_5

Updated Mar 17, 2025 • 2

MoeReward/sft_full_param_qwen_1_5

14B • Updated Mar 16, 2025 • 2

datasets 54

MoeReward/combined_rlhf_dataset_grpo_imdb_main_2K

Viewer • Updated May 6, 2025 • 2k • 6

MoeReward/combined_rlhf_dataset_grpo_metamath_main_2K

Viewer • Updated May 6, 2025 • 2k • 4

MoeReward/combined_rlhf_dataset_grpo_arc_main_2K

Viewer • Updated May 6, 2025 • 2k • 5

MoeReward/combined_rlhf_dataset_grpo_nq_main_2K

Viewer • Updated May 6, 2025 • 2k • 6

MoeReward/combined_rlhf_dataset_grpo_equal_dist_2K

Viewer • Updated May 6, 2025 • 2k • 5

MoeReward/combined_rlhf_dataset_grpo_imdb_main

Viewer • Updated Apr 1, 2025 • 4k • 9

MoeReward/combined_rlhf_dataset_grpo_metamath_main

Viewer • Updated Apr 1, 2025 • 4k • 5

MoeReward/combined_rlhf_dataset_grpo_arc_main

Viewer • Updated Apr 1, 2025 • 4k • 7

MoeReward/combined_rlhf_dataset_grpo_nq_main

Viewer • Updated Apr 1, 2025 • 4k • 7

MoeReward/combined_rlhf_dataset_grpo_equal_dist

Viewer • Updated Apr 1, 2025 • 4k • 4

View 54 datasets