TMLR-Group-HF/Self-Certainty-Qwen3-8B-Base-DAPO14k Text Generation • 8B • Updated Oct 11 • 10 • 1
TMLR-Group-HF/Self-Certainty-Qwen3-4B-Base-DAPO14k Text Generation • 4B • Updated Oct 11 • 10 • 1
TMLR-Group-HF/Majority-Voting-Qwen3-8B-Base-DAPO14k Text Generation • 8B • Updated Oct 11 • 17 • 2
TMLR-Group-HF/Co-rewarding-I-Qwen3-8B-Base-OpenRS Text Generation • 8B • Updated Oct 11 • 9 • 1
TMLR-Group-HF/Majority-Voting-Llama-3.2-3B-Instruct-DAPO14k Text Generation • 4B • Updated Oct 11 • 18
TMLR-Group-HF/Self-Certainty-Llama-3.2-3B-Instruct-DAPO14k Text Generation • 4B • Updated Oct 11 • 13
TMLR-Group-HF/Self-Certainty-Qwen3-8B-Base-OpenRS Text Generation • 8B • Updated Oct 11 • 11 • 1
TMLR-Group-HF/Co-rewarding-I-Qwen3-8B-Base-DAPO14k Text Generation • 8B • Updated Oct 11 • 9 • 1
TMLR-Group-HF/Majority-Voting-Qwen3-8B-Base-OpenRS Text Generation • 8B • Updated Oct 11 • 6 • 1