LLM360/guru-32B · Hugging Face

This repository contains the Guru-32B (base Qwen2.5-32B) model presented in Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective.

The leaderboard is evaluated with our evaluation code. The parameters we set in evaluation for all models: temperature=1.0, top_p=0.7.

Domain	Benchmark	GURU 7B	General Reasoner 7B	ORZ 7B◇	SimpleRL 7B	GURU 32B	ORZ 32B◇	SimpleRL 32B
Math	AIME24 (avg@32)	17.50	17.08	16.25	15.60	34.89	47.50	27.20
	MATH500	77.25	70.40	80.80	87.00	86.00	89.80	89.60
Code	LiveCodeBench (avg@4)	16.49	8.51	5.47	6.72	29.30	22.04	19.80
	HumanEval (avg@4)	82.62	61.12	67.38	58.08	90.85	84.30	81.25
	MBPP	70.00	39.80	48.40	49.60	78.80	74.20	76.75
Science	GPQA-diamond (avg@4)	40.78	38.64	37.63	35.98	50.63	55.67	46.46
	SuperGPQA	31.80	30.64	29.75	27.29	43.60	46.05	37.73
Logic	ARC-AGI (avg@4)	3.31	0.75	0.00	0.50	7.63	2.31	5.25
	Zebra Puzzle (avg@4)	39.40	0.07	1.00	0.62	45.21	0.54	1.16
Simulation	CodeI/O (avg@4)	15.63	7.13	5.13	6.63	12.63	3.75	9.75
	CruxEval-I	61.72	63.63	69.38	56.25	80.63	71.13	72.63
	CruxEval-O	71.28	56.50	65.88	58.31	88.75	82.38	67.75
Tabular	FinQA	34.70	34.33	37.60	35.10	46.14	45.20	45.41
	HiTab	74.20	54.40	54.10	50.40	82.00	63.30	69.00
	MultiHiertt (avg@4)	44.94	31.62	38.10	37.57	55.28	52.83	52.83
Others	IFEval	35.81	39.56	32.72	36.69	55.45	38.26	55.27
	LiveBench	18.57	19.76	12.64	15.20	34.30	28.78	28.33
	Average Score	43.29	33.76	35.42	33.97	54.24	47.53	46.25

Example usage:

from transformers import AutoTokenizer, AutoModelForCausalLM

model = "LLM360/Guru-32B"
tokenizer = AutoTokenizer.from_pretrained(model)
model = AutoModelForCausalLM.from_pretrained(model, device_map="auto", torch_dtype="auto")

messages = [{"role": "user", "content": "What is reinforcement learning?"}]
prompt = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(prompt, max_new_tokens=256, temperature=1.0, top_p=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Please refer to the paper for more details.

LLM360
/

guru-32B

Model tree for LLM360/guru-32B

Collection including LLM360/guru-32B

Guru