stepfun-ai
/

Qwen2.5-32B-DialogueReason

Model card Files Files and versions Community

Qwen2.5-32B-DialogueReason / README.md

brucestayhungry's picture

brucestayhungry

Update README.md

5bd2ff8 verified about 2 months ago

|

1.22 kB

	---
	license: apache-2.0
	---
	## Introduction
	Qwen2.5-32B-DialogueReason is a dialogue-based reasoning model built on Qwen2.5-32B-Base.
	We train the model using [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero) data through rule-based reinforcement learning.

	## 🧠 Key Features
	- Qwen2.5-32B-Base as the foundation.
	- Use Rule-Based RL to achieve dialogue reasoning.
	- With dynamic agent initialization to adapt to various scenarios.
	- With flexible environment configuration to set up task-specific contexts.
	- With multi-turn dialogue reasoning to incrementally solve problems.

	## Example
	### System:
	> The User asks a question, and the Assistant writes a masterpiece play depicting experts (picked based on the topic with concrete names) solving the question in a ultra-detailed dialogue. The response is formatted as: <play>the play goes here</play>\\n<answer> if asked to write code, then code here surrounded by ```. Otherwise, answer here with \\boxed{answer} emphasized</answer>.


	### User:
	> Give me a detailed explanation of PPO in RL

	### Assistant:
	> ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61775e8c1e3e9ecbf77ddee8/hluQG4Yz75FE5HFbxkKnh.png)