Update README.md
#1
by
brucestayhungry
- opened
README.md
CHANGED
@@ -10,4 +10,16 @@ We train the model using [Open-Reasoner-Zero](https://github.com/Open-Reasoner-Z
|
|
10 |
- Use Rule-Based RL to achieve dialogue reasoning.
|
11 |
- With dynamic agent initialization to adapt to various scenarios.
|
12 |
- With flexible environment configuration to set up task-specific contexts.
|
13 |
-
- With multi-turn dialogue reasoning to incrementally solve problems.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
- Use Rule-Based RL to achieve dialogue reasoning.
|
11 |
- With dynamic agent initialization to adapt to various scenarios.
|
12 |
- With flexible environment configuration to set up task-specific contexts.
|
13 |
+
- With multi-turn dialogue reasoning to incrementally solve problems.
|
14 |
+
|
15 |
+
## Example
|
16 |
+
### System:
|
17 |
+
> The User asks a question, and the Assistant writes a masterpiece play depicting experts (picked based on the topic with concrete names) solving the question in a ultra-detailed dialogue. The response is formatted as: <play>the play goes here</play>\\n<answer> if asked to write code, then code here surrounded by ```. Otherwise, answer here with \\boxed{answer} emphasized</answer>.
|
18 |
+
|
19 |
+
|
20 |
+
### User:
|
21 |
+
> Give me a detailed explanation of PPO in RL
|
22 |
+
|
23 |
+
### Assistant:
|
24 |
+
> 
|
25 |
+
|