--- license: apache-2.0 --- ## Introduction Qwen2.5-32B-DialogueReason is a dialogue-based reasoning model built on Qwen2.5-32B-Base. We train the model using [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero) data through rule-based reinforcement learning. ## 🧠 Key Features - Qwen2.5-32B-Base as the foundation. - Use Rule-Based RL to achieve dialogue reasoning. - With dynamic agent initialization to adapt to various scenarios. - With flexible environment configuration to set up task-specific contexts. - With multi-turn dialogue reasoning to incrementally solve problems. ## Example ### System: > The User asks a question, and the Assistant writes a masterpiece play depicting experts (picked based on the topic with concrete names) solving the question in a ultra-detailed dialogue. The response is formatted as: the play goes here\\n if asked to write code, then code here surrounded by ```. Otherwise, answer here with \\boxed{answer} emphasized. ### User: > Give me a detailed explanation of PPO in RL ### Assistant: > ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61775e8c1e3e9ecbf77ddee8/hluQG4Yz75FE5HFbxkKnh.png)