This repository contains the Guru-32B (base Qwen2.5-32B) model presented in Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective.

The leaderboard is evaluated with our evaluation code. The parameters we set in evaluation for all models: temperature=1.0, top_p=0.7.

Domain Benchmark GURU 7B General Reasoner 7B ORZ 7Bâ—‡ SimpleRL 7B GURU 32B ORZ 32Bâ—‡ SimpleRL 32B
Math AIME24 (avg@32) 17.50 17.08 16.25 15.60 34.89 47.50 27.20
MATH500 77.25 70.40 80.80 87.00 86.00 89.80 89.60
Code LiveCodeBench (avg@4) 16.49 8.51 5.47 6.72 29.30 22.04 19.80
HumanEval (avg@4) 82.62 61.12 67.38 58.08 90.85 84.30 81.25
MBPP 70.00 39.80 48.40 49.60 78.80 74.20 76.75
Science GPQA-diamond (avg@4) 40.78 38.64 37.63 35.98 50.63 55.67 46.46
SuperGPQA 31.80 30.64 29.75 27.29 43.60 46.05 37.73
Logic ARC-AGI (avg@4) 3.31 0.75 0.00 0.50 7.63 2.31 5.25
Zebra Puzzle (avg@4) 39.40 0.07 1.00 0.62 45.21 0.54 1.16
Simulation CodeI/O (avg@4) 15.63 7.13 5.13 6.63 12.63 3.75 9.75
CruxEval-I 61.72 63.63 69.38 56.25 80.63 71.13 72.63
CruxEval-O 71.28 56.50 65.88 58.31 88.75 82.38 67.75
Tabular FinQA 34.70 34.33 37.60 35.10 46.14 45.20 45.41
HiTab 74.20 54.40 54.10 50.40 82.00 63.30 69.00
MultiHiertt (avg@4) 44.94 31.62 38.10 37.57 55.28 52.83 52.83
Others IFEval 35.81 39.56 32.72 36.69 55.45 38.26 55.27
LiveBench 18.57 19.76 12.64 15.20 34.30 28.78 28.33
Average Score 43.29 33.76 35.42 33.97 54.24 47.53 46.25

Example usage:

from transformers import AutoTokenizer, AutoModelForCausalLM

model = "LLM360/Guru-32B"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", torch_dtype="auto")

messages = [{"role": "user", "content": "What is reinforcement learning?"}]
prompt = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(prompt, max_new_tokens=256, temperature=1.0, top_p=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Please refer to the paper for more details.

Downloads last month
49,696
Safetensors
Model size
32.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLM360/guru-32B

Quantizations
2 models

Collection including LLM360/guru-32B