zhong-zhang commited on
Commit
486dee7
·
verified ·
1 Parent(s): c081850

Update README.md

Browse files

update action space

Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -223,6 +223,25 @@ response = predict(instruction, image)
223
  print(response)
224
  ```
225
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
226
  ## Fine-tuning
227
 
228
  Source code for SFT and RFT training is provided — see [GitHub](https://github.com/OpenBMB/AgentCPM-GUI).
 
223
  print(response)
224
  ```
225
 
226
+ ### Action Space
227
+
228
+ At each step, the agent outputs is a single JSON object that contains:
229
+ - One (and only one) primitive action, chosen from the list below;
230
+ - Optional modifiers (`duration`, `thought`) and/or a task-level flag (`STATUS`).
231
+
232
+ Note that all keywords are **case-sensitive**, and we use **compact JSON** (i.e., no extra whitespace), which affects the tokenizer’s behavior.
233
+
234
+ | Action | Required field(s) | Optional field(s) | Purpose | Example |
235
+ | --------------------- | ----------------------------------------------------------------------------------------------------------- | ----------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------ |
236
+ | **Click** | `POINT:[x,y]` | `duration`,`thought`,`STATUS` | Single tap at the normalized screen coordinate (0–1000, origin = top-left). | `{"POINT":[480,320]}` |
237
+ | **Long Press** | `POINT:[x,y]`<br>`duration:1000` | `duration`,`thought`,`STATUS` | Touch-and-hold at coordinate (set a longer duration, e.g. >200 ms). | `{"POINT":[480,320],"duration":1000}` |
238
+ | **Swipe** | `POINT:[x,y]`<br>`to:"up" \| "down" \| "left" \| "right"` **or** `to:[x,y]` | `duration`,`thought`,`STATUS` | Swipe from the start point toward a direction **or** another coordinate. | `{"POINT":[500,200],"to":"down"}` |
239
+ | **Press key** | `PRESS:"HOME" \| "BACK" \| "ENTER"` | `duration`,`thought`,`STATUS` | Trigger a hardware / navigation button. | `{"PRESS":"HOME"}` |
240
+ | **Type text** | `TYPE:"<text>"` | `duration`,`thought`,`STATUS` | Insert the given text at the current input focus. | `{"TYPE":"Hello, world!"}` |
241
+ | **Wait** | `duration` | `thought`,`STATUS` | Idle for the specified time without any other action. | `{"duration":500}` |
242
+ | **Task-level status** | `STATUS:"start" \| "continue" \| "finish" \| "satisfied" \| "impossible" \| "interrupt" \| "need_feedback"` | `duration`,`thought` | Report task progress; may appear **alone** or **with a primitive action**. | `{"STATUS":"finish"}` |
243
+
244
+
245
  ## Fine-tuning
246
 
247
  Source code for SFT and RFT training is provided — see [GitHub](https://github.com/OpenBMB/AgentCPM-GUI).