ZHLiu627/verl_agent_webshop-new-GRPO-kl-0.01-int-reward_False-Llama-3.1-8B-Instruct-2-150step Updated 2 days ago • 13
ZHLiu627/verl_agent_alfworld-GRPO-coef0.9-Llama-3.1-8B-Instruct-150step-150step Updated 4 days ago • 1
ZHLiu627/verl_agent_alfworld-GRPO-coef1.1-Llama-3.1-8B-Instruct-150step-150step Updated 4 days ago • 3
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1 Viewer • Updated Feb 27 • 29.3k • 33
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1_v1 Viewer • Updated Feb 27 • 29.3k • 56 • 1
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1 Viewer • Updated Feb 22 • 29.3k • 36
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2filtered Viewer • Updated Feb 19 • 28.9k • 41
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2 Viewer • Updated Feb 19 • 29.3k • 28
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filteredd Viewer • Updated Feb 19 • 29.3k • 29
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1filtered Viewer • Updated Feb 19 • 29.1k • 27
ZHLiu627/qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2 Viewer • Updated Feb 18 • 29.3k • 33