Update README.md
Browse filesupdate action space
README.md
CHANGED
@@ -223,6 +223,25 @@ response = predict(instruction, image)
|
|
223 |
print(response)
|
224 |
```
|
225 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
226 |
## Fine-tuning
|
227 |
|
228 |
Source code for SFT and RFT training is provided — see [GitHub](https://github.com/OpenBMB/AgentCPM-GUI).
|
|
|
223 |
print(response)
|
224 |
```
|
225 |
|
226 |
+
### Action Space
|
227 |
+
|
228 |
+
At each step, the agent outputs is a single JSON object that contains:
|
229 |
+
- One (and only one) primitive action, chosen from the list below;
|
230 |
+
- Optional modifiers (`duration`, `thought`) and/or a task-level flag (`STATUS`).
|
231 |
+
|
232 |
+
Note that all keywords are **case-sensitive**, and we use **compact JSON** (i.e., no extra whitespace), which affects the tokenizer’s behavior.
|
233 |
+
|
234 |
+
| Action | Required field(s) | Optional field(s) | Purpose | Example |
|
235 |
+
| --------------------- | ----------------------------------------------------------------------------------------------------------- | ----------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------ |
|
236 |
+
| **Click** | `POINT:[x,y]` | `duration`,`thought`,`STATUS` | Single tap at the normalized screen coordinate (0–1000, origin = top-left). | `{"POINT":[480,320]}` |
|
237 |
+
| **Long Press** | `POINT:[x,y]`<br>`duration:1000` | `duration`,`thought`,`STATUS` | Touch-and-hold at coordinate (set a longer duration, e.g. >200 ms). | `{"POINT":[480,320],"duration":1000}` |
|
238 |
+
| **Swipe** | `POINT:[x,y]`<br>`to:"up" \| "down" \| "left" \| "right"` **or** `to:[x,y]` | `duration`,`thought`,`STATUS` | Swipe from the start point toward a direction **or** another coordinate. | `{"POINT":[500,200],"to":"down"}` |
|
239 |
+
| **Press key** | `PRESS:"HOME" \| "BACK" \| "ENTER"` | `duration`,`thought`,`STATUS` | Trigger a hardware / navigation button. | `{"PRESS":"HOME"}` |
|
240 |
+
| **Type text** | `TYPE:"<text>"` | `duration`,`thought`,`STATUS` | Insert the given text at the current input focus. | `{"TYPE":"Hello, world!"}` |
|
241 |
+
| **Wait** | `duration` | `thought`,`STATUS` | Idle for the specified time without any other action. | `{"duration":500}` |
|
242 |
+
| **Task-level status** | `STATUS:"start" \| "continue" \| "finish" \| "satisfied" \| "impossible" \| "interrupt" \| "need_feedback"` | `duration`,`thought` | Report task progress; may appear **alone** or **with a primitive action**. | `{"STATUS":"finish"}` |
|
243 |
+
|
244 |
+
|
245 |
## Fine-tuning
|
246 |
|
247 |
Source code for SFT and RFT training is provided — see [GitHub](https://github.com/OpenBMB/AgentCPM-GUI).
|