Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
PeterJinGo
's Collections
Search-R1-v0.3
Search-R1-v0.2
Search-R1
Search-R1-v0.3
updated
20 days ago
RL with outcome reward + format reward. https://arxiv.org/abs/2505.15117
Upvote
2
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.3
Updated
22 days ago
•
134
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-ppo-v0.3
Updated
22 days ago
•
13
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-grpo-v0.3
Updated
22 days ago
•
85
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-grpo-v0.3
Updated
22 days ago
•
13
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-ppo-v0.3
Updated
22 days ago
•
62
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-grpo-v0.3
Updated
22 days ago
•
10
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-it-em-grpo-v0.3
Updated
22 days ago
•
51
•
1
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-em-ppo-v0.3
Updated
May 2
•
9
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-em-grpo-v0.3
Updated
May 2
•
19
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-it-em-grpo-v0.3
Updated
May 2
•
9
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-32b-em-grpo-v0.3
Updated
May 10
•
1.09k
Upvote
2
Share collection
View history
Collection guide
Browse collections