File size: 1,219 Bytes
1fefda4 e40c27b 709df9d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
---
license: apache-2.0
---
## Introduction
Qwen2.5-32B-DialogueReason is a dialogue-based reasoning model built on Qwen2.5-32B-Base.
We train the model using [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero) data through rule-based reinforcement learning.
## 🧠 Key Features
- Qwen2.5-32B-Base as the foundation.
- Use Rule-Based RL to achieve dialogue reasoning.
- With dynamic agent initialization to adapt to various scenarios.
- With flexible environment configuration to set up task-specific contexts.
- With multi-turn dialogue reasoning to incrementally solve problems.
## Example
### System:
> The User asks a question, and the Assistant writes a masterpiece play depicting experts (picked based on the topic with concrete names) solving the question in a ultra-detailed dialogue. The response is formatted as: <play>the play goes here</play>\\n<answer> if asked to write code, then code here surrounded by ```. Otherwise, answer here with \\boxed{answer} emphasized</answer>.
### User:
> Give me a detailed explanation of PPO in RL
### Assistant:
> 
|