BeingBeyond
/

Being-H0-14B-2508

@@ -19,7 +19,7 @@ pipeline_tag: robotics
 [![Project Page](https://img.shields.io/badge/Website-Being--H0-green)](https://beingbeyond.github.io/Being-H0)
 [![arXiv](https://img.shields.io/badge/arXiv-2507.15597-b31b1b.svg)](https://arxiv.org/abs/2507.15597)
-[![Model](https://img.shields.io/badge/Hugging%20Face-GitHub-yellow)](https://github.com/BeingBeyond/Being-H0)
 [![License](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)
 </div>
@@ -48,6 +48,14 @@ Download pre-trained models from Hugging Face:
 | **VLA Pre-trained** | [Being-H0-14B-2508](https://huggingface.co/BeingBeyond/Being-H0-14B-2508) | 14B | Base vision-language-action model |
 | **VLA Post-trained** | [Being-H0-8B-Align-2508](https://huggingface.co/BeingBeyond/Being-H0-8B-Align-2508) | 8B | Fine-tuned for robot alignment |
 ## Setup
 ### Clone repository
@@ -75,28 +83,37 @@ pip install git+https://github.com/mattloper/chumpy.git
 - Visit [MANO website](http://mano.is.tue.mpg.de/)
 - Create an account by clicking _Sign Up_ and provide your information
 - Download Models and Code (the downloaded file should have the format `mano_v*_*.zip`). Note that all code and data from this download falls under the [MANO license](http://mano.is.tue.mpg.de/license).
-- unzip and copy the contents in `mano_v*_*/` folder to the `beingvla/models/motion/mano/` folder
 ## Inference
 ### Motion Generation
 ```bash
 python -m beingvla.inference.vla_internvl_inference \
     --model_path /path/to/Being-H0-XXX \
-    --motion_code_path "/path/to/Being-H0-GRVQ-8K/wrist+/path/to/Being-H0-GRVQ-8K/finger" \
     --input_image ./playground/unplug_airpods.jpg \
     --task_description "unplug the charging cable from the AirPods" \
     --hand_mode both \
     --num_samples 3 \
     --num_seconds 4 \
     --enable_render true \
     --output_dir ./work_dirs/
 ```
 ### Evaluation
-Setup robot communication:
 ```bash
 python -m beingvla.models.motion.m2m.aligner.run_server \
@@ -104,12 +121,12 @@ python -m beingvla.models.motion.m2m.aligner.run_server \
     --port 12305 \
     --action-chunk-length 16
 ```
-Run evaluation on robot task:
 ```bash
 python -m beingvla.models.motion.m2m.aligner.eval_policy \
     --model-path /path/to/Being-H0-XXX-Align \
-    --zarr-path /path/to/zarr/data \
     --task_description "Put the little white duck into the cup." \
     --action-chunk-length 16
 ```

 [![Project Page](https://img.shields.io/badge/Website-Being--H0-green)](https://beingbeyond.github.io/Being-H0)
 [![arXiv](https://img.shields.io/badge/arXiv-2507.15597-b31b1b.svg)](https://arxiv.org/abs/2507.15597)
+[![Model](https://img.shields.io/badge/GitHub-Being--H0-white)](https://huggingface.co/BeingBeyond/Being-H0)
 [![License](https://img.shields.io/badge/License-MIT-blue.svg)](./LICENSE)
 </div>
 | **VLA Pre-trained** | [Being-H0-14B-2508](https://huggingface.co/BeingBeyond/Being-H0-14B-2508) | 14B | Base vision-language-action model |
 | **VLA Post-trained** | [Being-H0-8B-Align-2508](https://huggingface.co/BeingBeyond/Being-H0-8B-Align-2508) | 8B | Fine-tuned for robot alignment |
+## Dataset
+We have provided the dataset for post-training the VLA model. The dataset is available in Hugging Face:
+| Dataset Type | Dataset Name | Description |
+|--------------|--------------|-------------|
+| **VLA Post-training** | [h0_post_train_db_2508](https://huggingface.co/datasets/BeingBeyond/h0_post_train_db_2508) | Post-training dataset for pretrained Being-H0 VLA model |
 ## Setup
 ### Clone repository
 - Visit [MANO website](http://mano.is.tue.mpg.de/)
 - Create an account by clicking _Sign Up_ and provide your information
 - Download Models and Code (the downloaded file should have the format `mano_v*_*.zip`). Note that all code and data from this download falls under the [MANO license](http://mano.is.tue.mpg.de/license).
+- Unzip and copy the contents in `mano_v*_*/` folder to the `beingvla/models/motion/mano/` folder
 ## Inference
 ### Motion Generation
+- To generate hand motion tokens and render the motion, you should use the Motion Model (`Being-H0-GRVQ-8K`) and the pretrained VLA model (`Being-H0-{1B,8B,14B}-2508`).
+- You can use the following command to inference. For the `--motion_code_path`, you should use a `+` symbol to jointly specify the wrist and finger motion code paths, e.g., `--motion_code_path "/path/to/Being-H0-GRVQ-8K/wrist/+/path/to/Being-H0-GRVQ-8K/finger/"`.
+- The `--hand_mode` can be set to `left`, `right`, or `both` to specify which hand to use for the task.
 ```bash
 python -m beingvla.inference.vla_internvl_inference \
     --model_path /path/to/Being-H0-XXX \
+    --motion_code_path "/path/to/Being-H0-GRVQ-8K/wrist/+/path/to/Being-H0-GRVQ-8K/finger/" \
     --input_image ./playground/unplug_airpods.jpg \
     --task_description "unplug the charging cable from the AirPods" \
     --hand_mode both \
     --num_samples 3 \
     --num_seconds 4 \
     --enable_render true \
+    --gpu_device 0 \
     --output_dir ./work_dirs/
 ```
+- **To inference on your own photos**: See [Camera Intrinsics Guide](https://github.com/BeingBeyond/Being-H0/blob/main/docs/camera_intrinsics.md) for how to estimate camera intrinsics and input them for custom inference.
 ### Evaluation
+- You can use our pretrained VLA model to post-train on real robot data. When you get your post-trained model (e.g., `Being-H0-8B-Align-2508`), you can use the following commands to communicate with real robot, or evaluate the model on a robot task.
+- Setup robot communication:
 ```bash
 python -m beingvla.models.motion.m2m.aligner.run_server \
     --port 12305 \
     --action-chunk-length 16
 ```
+- Run evaluation on robot task:
 ```bash
 python -m beingvla.models.motion.m2m.aligner.eval_policy \
     --model-path /path/to/Being-H0-XXX-Align \
+    --zarr-path /path/to/real-robot/data \
     --task_description "Put the little white duck into the cup." \
     --action-chunk-length 16
 ```