Update README.md
Browse filesupdate action space
README.md
CHANGED
|
@@ -223,6 +223,25 @@ response = predict(instruction, image)
|
|
| 223 |
print(response)
|
| 224 |
```
|
| 225 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 226 |
## Fine-tuning
|
| 227 |
|
| 228 |
Source code for SFT and RFT training is provided — see [GitHub](https://github.com/OpenBMB/AgentCPM-GUI).
|
|
|
|
| 223 |
print(response)
|
| 224 |
```
|
| 225 |
|
| 226 |
+
### Action Space
|
| 227 |
+
|
| 228 |
+
At each step, the agent outputs is a single JSON object that contains:
|
| 229 |
+
- One (and only one) primitive action, chosen from the list below;
|
| 230 |
+
- Optional modifiers (`duration`, `thought`) and/or a task-level flag (`STATUS`).
|
| 231 |
+
|
| 232 |
+
Note that all keywords are **case-sensitive**, and we use **compact JSON** (i.e., no extra whitespace), which affects the tokenizer’s behavior.
|
| 233 |
+
|
| 234 |
+
| Action | Required field(s) | Optional field(s) | Purpose | Example |
|
| 235 |
+
| --------------------- | ----------------------------------------------------------------------------------------------------------- | ----------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------ |
|
| 236 |
+
| **Click** | `POINT:[x,y]` | `duration`,`thought`,`STATUS` | Single tap at the normalized screen coordinate (0–1000, origin = top-left). | `{"POINT":[480,320]}` |
|
| 237 |
+
| **Long Press** | `POINT:[x,y]`<br>`duration:1000` | `duration`,`thought`,`STATUS` | Touch-and-hold at coordinate (set a longer duration, e.g. >200 ms). | `{"POINT":[480,320],"duration":1000}` |
|
| 238 |
+
| **Swipe** | `POINT:[x,y]`<br>`to:"up" \| "down" \| "left" \| "right"` **or** `to:[x,y]` | `duration`,`thought`,`STATUS` | Swipe from the start point toward a direction **or** another coordinate. | `{"POINT":[500,200],"to":"down"}` |
|
| 239 |
+
| **Press key** | `PRESS:"HOME" \| "BACK" \| "ENTER"` | `duration`,`thought`,`STATUS` | Trigger a hardware / navigation button. | `{"PRESS":"HOME"}` |
|
| 240 |
+
| **Type text** | `TYPE:"<text>"` | `duration`,`thought`,`STATUS` | Insert the given text at the current input focus. | `{"TYPE":"Hello, world!"}` |
|
| 241 |
+
| **Wait** | `duration` | `thought`,`STATUS` | Idle for the specified time without any other action. | `{"duration":500}` |
|
| 242 |
+
| **Task-level status** | `STATUS:"start" \| "continue" \| "finish" \| "satisfied" \| "impossible" \| "interrupt" \| "need_feedback"` | `duration`,`thought` | Report task progress; may appear **alone** or **with a primitive action**. | `{"STATUS":"finish"}` |
|
| 243 |
+
|
| 244 |
+
|
| 245 |
## Fine-tuning
|
| 246 |
|
| 247 |
Source code for SFT and RFT training is provided — see [GitHub](https://github.com/OpenBMB/AgentCPM-GUI).
|