--- license: apache-2.0 pipeline_tag: text-generation --- # Model Card for Qwen3-32B-LoRA-ECHO-KK-GRPO Based on Qwen3-32B, we applied the ECHO framework to perform LoRA fine-tuning on the KK dataset. Ultimately, it achieved near-perfect scores on the 2–8 PPL test set, surpassing o4-mini, DeepSeek-R1, and o3-mini-high. Tabel 3: Model performance on K&K logic puzzle task across different degrees of difficulty | model | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |----------------|---------------------------:|--------------------------:|--------------------------:|--------------------:|------------:|-------------:|-------------:| | Qwen3-32B | 0.98 | 0.99 | 0.98 | 0.99 | 0.98 | 0.96 |0.95 | | Deepseek-R1 | 1.00 | 0.97 | 0.95 | 0.93 | 0.91 | 0.93 |0.91 | | o3-mini-high | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 0.98 |0.98 | | o4-mini | 1.00 | 1.00 | 0.96 | 0.94 | 0.97 | 0.93 |0.87 | | Qwen3-32B-Echo(GRPO w/Lora) | 0.99 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 |0.99 | # Quick start ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "GradientResearch/Qwen3-32B-LoRA-ECHO-KK-GRPO"# load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) # prepare the model input prompt = "K & K" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=True # Switches between thinking and non-thinking modes. Default is True. ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # conduct text completion generated_ids = model.generate( **model_inputs, max_new_tokens=32768 ) output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() # parsing thinking contenttry: # rindex finding 151668 () index = len(output_ids) - output_ids[::-1].index(151668) except ValueError: index = 0 thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n") content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") print("thinking content:", thinking_content) print("content:", content) ``` # Citation If you find our work helpful, feel free to give us a cite. ``` @misc{xiao2025echodecouplinginferencetraining, title={Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms}, author={Jie Xiao and Changyuan Fan and Qingnan Ren and Alfred Long and Yuchen Zhang and Rymon Yu and Eric Yang and Lynn Ai and Shaoduo Gan}, year={2025}, eprint={2508.05387}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.05387}, } ```