Update README.md
Browse files
README.md
CHANGED
@@ -19,7 +19,7 @@ pipeline_tag: robotics
|
|
19 |
|
20 |
[](https://beingbeyond.github.io/Being-H0)
|
21 |
[](https://arxiv.org/abs/2507.15597)
|
22 |
-
[](./LICENSE)
|
24 |
|
25 |
</div>
|
@@ -48,6 +48,14 @@ Download pre-trained models from Hugging Face:
|
|
48 |
| **VLA Pre-trained** | [Being-H0-14B-2508](https://huggingface.co/BeingBeyond/Being-H0-14B-2508) | 14B | Base vision-language-action model |
|
49 |
| **VLA Post-trained** | [Being-H0-8B-Align-2508](https://huggingface.co/BeingBeyond/Being-H0-8B-Align-2508) | 8B | Fine-tuned for robot alignment |
|
50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
51 |
## Setup
|
52 |
|
53 |
### Clone repository
|
@@ -75,28 +83,37 @@ pip install git+https://github.com/mattloper/chumpy.git
|
|
75 |
- Visit [MANO website](http://mano.is.tue.mpg.de/)
|
76 |
- Create an account by clicking _Sign Up_ and provide your information
|
77 |
- Download Models and Code (the downloaded file should have the format `mano_v*_*.zip`). Note that all code and data from this download falls under the [MANO license](http://mano.is.tue.mpg.de/license).
|
78 |
-
-
|
79 |
|
80 |
## Inference
|
81 |
|
82 |
### Motion Generation
|
83 |
|
|
|
|
|
|
|
|
|
84 |
```bash
|
85 |
python -m beingvla.inference.vla_internvl_inference \
|
86 |
--model_path /path/to/Being-H0-XXX \
|
87 |
-
--motion_code_path "/path/to/Being-H0-GRVQ-8K/wrist
|
88 |
--input_image ./playground/unplug_airpods.jpg \
|
89 |
--task_description "unplug the charging cable from the AirPods" \
|
90 |
--hand_mode both \
|
91 |
--num_samples 3 \
|
92 |
--num_seconds 4 \
|
93 |
--enable_render true \
|
|
|
94 |
--output_dir ./work_dirs/
|
95 |
```
|
96 |
|
|
|
|
|
97 |
### Evaluation
|
98 |
|
99 |
-
|
|
|
|
|
100 |
|
101 |
```bash
|
102 |
python -m beingvla.models.motion.m2m.aligner.run_server \
|
@@ -104,12 +121,12 @@ python -m beingvla.models.motion.m2m.aligner.run_server \
|
|
104 |
--port 12305 \
|
105 |
--action-chunk-length 16
|
106 |
```
|
107 |
-
Run evaluation on robot task:
|
108 |
|
109 |
```bash
|
110 |
python -m beingvla.models.motion.m2m.aligner.eval_policy \
|
111 |
--model-path /path/to/Being-H0-XXX-Align \
|
112 |
-
--zarr-path /path/to/
|
113 |
--task_description "Put the little white duck into the cup." \
|
114 |
--action-chunk-length 16
|
115 |
```
|
|
|
19 |
|
20 |
[](https://beingbeyond.github.io/Being-H0)
|
21 |
[](https://arxiv.org/abs/2507.15597)
|
22 |
+
[](https://huggingface.co/BeingBeyond/Being-H0)
|
23 |
[](./LICENSE)
|
24 |
|
25 |
</div>
|
|
|
48 |
| **VLA Pre-trained** | [Being-H0-14B-2508](https://huggingface.co/BeingBeyond/Being-H0-14B-2508) | 14B | Base vision-language-action model |
|
49 |
| **VLA Post-trained** | [Being-H0-8B-Align-2508](https://huggingface.co/BeingBeyond/Being-H0-8B-Align-2508) | 8B | Fine-tuned for robot alignment |
|
50 |
|
51 |
+
## Dataset
|
52 |
+
|
53 |
+
We have provided the dataset for post-training the VLA model. The dataset is available in Hugging Face:
|
54 |
+
|
55 |
+
| Dataset Type | Dataset Name | Description |
|
56 |
+
|--------------|--------------|-------------|
|
57 |
+
| **VLA Post-training** | [h0_post_train_db_2508](https://huggingface.co/datasets/BeingBeyond/h0_post_train_db_2508) | Post-training dataset for pretrained Being-H0 VLA model |
|
58 |
+
|
59 |
## Setup
|
60 |
|
61 |
### Clone repository
|
|
|
83 |
- Visit [MANO website](http://mano.is.tue.mpg.de/)
|
84 |
- Create an account by clicking _Sign Up_ and provide your information
|
85 |
- Download Models and Code (the downloaded file should have the format `mano_v*_*.zip`). Note that all code and data from this download falls under the [MANO license](http://mano.is.tue.mpg.de/license).
|
86 |
+
- Unzip and copy the contents in `mano_v*_*/` folder to the `beingvla/models/motion/mano/` folder
|
87 |
|
88 |
## Inference
|
89 |
|
90 |
### Motion Generation
|
91 |
|
92 |
+
- To generate hand motion tokens and render the motion, you should use the Motion Model (`Being-H0-GRVQ-8K`) and the pretrained VLA model (`Being-H0-{1B,8B,14B}-2508`).
|
93 |
+
- You can use the following command to inference. For the `--motion_code_path`, you should use a `+` symbol to jointly specify the wrist and finger motion code paths, e.g., `--motion_code_path "/path/to/Being-H0-GRVQ-8K/wrist/+/path/to/Being-H0-GRVQ-8K/finger/"`.
|
94 |
+
- The `--hand_mode` can be set to `left`, `right`, or `both` to specify which hand to use for the task.
|
95 |
+
|
96 |
```bash
|
97 |
python -m beingvla.inference.vla_internvl_inference \
|
98 |
--model_path /path/to/Being-H0-XXX \
|
99 |
+
--motion_code_path "/path/to/Being-H0-GRVQ-8K/wrist/+/path/to/Being-H0-GRVQ-8K/finger/" \
|
100 |
--input_image ./playground/unplug_airpods.jpg \
|
101 |
--task_description "unplug the charging cable from the AirPods" \
|
102 |
--hand_mode both \
|
103 |
--num_samples 3 \
|
104 |
--num_seconds 4 \
|
105 |
--enable_render true \
|
106 |
+
--gpu_device 0 \
|
107 |
--output_dir ./work_dirs/
|
108 |
```
|
109 |
|
110 |
+
- **To inference on your own photos**: See [Camera Intrinsics Guide](https://github.com/BeingBeyond/Being-H0/blob/main/docs/camera_intrinsics.md) for how to estimate camera intrinsics and input them for custom inference.
|
111 |
+
|
112 |
### Evaluation
|
113 |
|
114 |
+
- You can use our pretrained VLA model to post-train on real robot data. When you get your post-trained model (e.g., `Being-H0-8B-Align-2508`), you can use the following commands to communicate with real robot, or evaluate the model on a robot task.
|
115 |
+
|
116 |
+
- Setup robot communication:
|
117 |
|
118 |
```bash
|
119 |
python -m beingvla.models.motion.m2m.aligner.run_server \
|
|
|
121 |
--port 12305 \
|
122 |
--action-chunk-length 16
|
123 |
```
|
124 |
+
- Run evaluation on robot task:
|
125 |
|
126 |
```bash
|
127 |
python -m beingvla.models.motion.m2m.aligner.eval_policy \
|
128 |
--model-path /path/to/Being-H0-XXX-Align \
|
129 |
+
--zarr-path /path/to/real-robot/data \
|
130 |
--task_description "Put the little white duck into the cup." \
|
131 |
--action-chunk-length 16
|
132 |
```
|