Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -10,4 +10,16 @@ We train the model using [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Z
10
  - Use Rule-Based RL to achieve dialogue reasoning.
11
  - With dynamic agent initialization to adapt to various scenarios.
12
  - With flexible environment configuration to set up task-specific contexts.
13
- - With multi-turn dialogue reasoning to incrementally solve problems.
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  - Use Rule-Based RL to achieve dialogue reasoning.
11
  - With dynamic agent initialization to adapt to various scenarios.
12
  - With flexible environment configuration to set up task-specific contexts.
13
+ - With multi-turn dialogue reasoning to incrementally solve problems.
14
+
15
+ ## Example
16
+ ### System:
17
+ > The User asks a question, and the Assistant writes a masterpiece play depicting experts (picked based on the topic with concrete names) solving the question in a ultra-detailed dialogue. The response is formatted as: <play>the play goes here</play>\\n<answer> if asked to write code, then code here surrounded by ```. Otherwise, answer here with \\boxed{answer} emphasized</answer>.
18
+
19
+
20
+ ### User:
21
+ > Give me a detailed explanation of PPO in RL
22
+
23
+ ### Assistant:
24
+ > ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61775e8c1e3e9ecbf77ddee8/hluQG4Yz75FE5HFbxkKnh.png)
25
+