stepfun-ai
/

Qwen2.5-32B-DialogueReason

Model card Files Files and versions Community

Update README.md

#1

by brucestayhungry - opened 28 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -10,4 +10,16 @@ We train the model using [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Z
 - Use Rule-Based RL to achieve dialogue reasoning.
 - With dynamic agent initialization to adapt to various scenarios.
 - With flexible environment configuration to set up task-specific contexts.
-- With multi-turn dialogue reasoning to incrementally solve problems.

 - Use Rule-Based RL to achieve dialogue reasoning.
 - With dynamic agent initialization to adapt to various scenarios.
 - With flexible environment configuration to set up task-specific contexts.
+- With multi-turn dialogue reasoning to incrementally solve problems.
+## Example
+### System:
+> The User asks a question, and the Assistant writes a masterpiece play depicting experts (picked based on the topic with concrete names) solving the question in a ultra-detailed dialogue. The response is formatted as: <play>the play goes here</play>\\n<answer> if asked to write code, then code here surrounded by ```. Otherwise, answer here with \\boxed{answer} emphasized</answer>.
+### User:
+> Give me a detailed explanation of PPO in RL
+### Assistant:
+> ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61775e8c1e3e9ecbf77ddee8/hluQG4Yz75FE5HFbxkKnh.png)