Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
PeterJinGo 's Collections
Search-R1-v0.3
Search-R1-v0.2
Search-R1

Search-R1-v0.3

updated 15 days ago

RL with outcome reward + format reward. https://arxiv.org/abs/2505.15117

Upvote
2

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-ppo-v0.3

    3B • Updated May 21 • 1.95k

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-ppo-v0.3

    3B • Updated May 21 • 7

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-grpo-v0.3

    3B • Updated May 21 • 194

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-grpo-v0.3

    3B • Updated May 21 • 6

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-ppo-v0.3

    8B • Updated May 21 • 2.79k

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-grpo-v0.3

    8B • Updated May 21 • 7

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-it-em-grpo-v0.3

    8B • Updated May 21 • 12 • 1

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-em-ppo-v0.3

    15B • Updated May 2 • 26

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-em-grpo-v0.3

    15B • Updated May 2 • 7

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-14b-it-em-grpo-v0.3

    15B • Updated May 2 • 8

  • PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-32b-em-grpo-v0.3

    33B • Updated May 10 • 1.09k

  • PeterJinGo/LICENCE

    Viewer • Updated 15 days ago • 202 • 84
Upvote
2
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs