juexzz commited on
Commit
ae40629
·
verified ·
1 Parent(s): 06fc59f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -0
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - lerobot/pi0
5
+ pipeline_tag: robotics
6
+ ---
7
+
8
+ # INTACT Probing Suite: Pi0 Fine-tuned on BridgeV2
9
+
10
+ > 📦 **This model is part of the [INTACT Probing Suite Collection](https://huggingface.co/collections/ai4ce/intact-probing-suite-684e5601e9ed640fdd9b994b)**
11
+ > Explore other variants:
12
+ > - [Pi0 fintuned on BridgeV2](https://huggingface.co/juexzz/INTACT-pi0-finetune-bridge)
13
+ > - [Pi0 finetuned with paraphrase on BridgeV2](https://huggingface.co/juexzz/INTACT-pi0-finetune-rephrase-bridge)
14
+
15
+ ## INTACT-pi0-scratch-bridge
16
+
17
+ This repository contains a checkpoint of the Pi0 model ([HF implementation](https://huggingface.co/lerobot/pi0) | [Paper](https://arxiv.org/abs/2410.24164v1)) *initialized from PaliGemma and trained directly ("from scratch")* on the BridgeV2 dataset for robotic manipulation tasks.
18
+ The model is later used for testing on the [Simpler Environment](https://github.com/simpler-env/SimplerEnv) and our [INTACT](https://github.com/ai4ce/INT-ACT) Probing Suite for the generalization boundaries of VLA models.
19
+
20
+ **Paper**: [From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models](https://arxiv.org/abs/2506.09930)
21
+
22
+ ## Model Details
23
+
24
+ - **Base Model**: [lerobot/pi0](https://huggingface.co/lerobot/pi0)
25
+ - **Training Dataset**: [BridgeV2](https://rail-berkeley.github.io/bridgedata/)
26
+ - **Model Type**: Vision-Language-Action (VLA) model for robotics
27
+ - **Fine-tuning Method**: See our [paper](https://arxiv.org/abs/2506.09930)
28
+ - **Training Framework**: See our [repository](https://github.com/ai4ce/INT-ACT)
29
+
30
+
31
+
32
+ ## Quick Start
33
+
34
+
35
+ ### Usage in INTACT
36
+
37
+ ```shell
38
+ git clone --recurse-submodules https://github.com/ai4ce/INT-ACT.git
39
+ cd INT-ACT
40
+ uv sync
41
+ source .venv/bin/activate
42
+ python
43
+ ```
44
+ Or directly in python with Lerobot, see blow:
45
+
46
+ ### Integration with LeRobot
47
+
48
+ First, install lerobot
49
+ ```bash
50
+ pip install lerobot
51
+ ```
52
+ Then
53
+
54
+ ```python
55
+ import torch
56
+ from lerobot.common.policies.pi0.modeling_pi0 import Pi0Policy
57
+
58
+ # Load model
59
+ policy = Pi0Policy.from_pretrained("juexzz/INTACT-pi0-scratch-bridge")
60
+
61
+ # Inference
62
+ with torch.no_grad():
63
+ actions = policy.select_action(batch)
64
+ ```
65
+
66
+
67
+ ### Training Configuration
68
+ - **Training Steps**: 15 epochs ~22695 steps.
69
+ - **Batch Size**: 1024
70
+ - **Learning Rate**: 1e-5
71
+ - **Hardware**: 4 H100/A100
72
+ - **Input Modalities**: single image (to work with SimplerEnv), 1 language instruction, 1 robot state.
73
+ - **Output**: robot actions (delta EEF) with chunk size of 4.
74
+ For more details please refer to our [paper](https://arxiv.org/abs/2506.09930) and [code](https://github.com/ai4ce/INT-ACT)
75
+
76
+
77
+ ## Evaluation
78
+
79
+ **Checkpoint choice**
80
+ After training 15 epochs, we sweep the checkpoint at epoch 1, 2, 3, 4, 5, 10, 15 for performance on the original 4 Bridge tasks in the SimplerEnv, and choose the checkpoint with *best average performance* for each of the three Pi0 variants.
81
+ Therefore, you may still get a better success rate for a specific task at other checkpoints.
82
+ As a result, the best checkpoint for this pi0 finetune model is at step 22695 (epoch 15).
83
+
84
+ The comparison of their performance on Simpler are shown below.
85
+
86
+ ### Performance Comparison on SimplerEnv
87
+
88
+ **Success rate** comparison on the SimplerEnv with other pi0 variants and some other baselines experimented in our INTACT suite.
89
+ For a more detailed comparison, please refer to the [paper](https://arxiv.org/abs/2506.09930).
90
+
91
+
92
+ | Model | carrot_on_plate | eggplant_in_basket | stack_cube | spoon_on_towel |
93
+ |-------|-----------------|-------------------|------------|----------------|
94
+ | [Pi0 finetune](https://huggingface.co/juexzz/INTACT-pi0-finetune-bridge) | 0.361 | 0.819 | 0.264 | 0.458 |
95
+ | [Pi0 finetune rephrase](https://huggingface.co/juexzz/INTACT-pi0-finetune-rephrase-bridge) | 0.500 | 0.944 | 0.222 | 0.597 |
96
+ | **Pi0 scratch(this model)** | 0.542 | 0.903 | 0.403 | 0.875 |
97
+ | Spatial VLA | 0.125 | 0.958 | 0.292 | 0.208 |
98
+ | Magma | 0.250 | 0.611 | 0.097 | 0.208 |
99
+ | Octo Small | 0.014 | 0.097 | 0.000 | 0.097 |
100
+ | Octo Base | 0.014 | 0.306 | 0.000 | 0.014 |
101
+
102
+
103
+
104
+
105
+ ## Citation
106
+
107
+ If you use this model in your research, please cite:
108
+
109
+ ```bibtex
110
+ @article{fang2025intention,
111
+ title={From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models},
112
+ author={Fang, Irving and Zhang, Juexiao and Tong, Shengbang and Feng, Chen},
113
+ journal={arXiv preprint arXiv:2506.09930},
114
+ year={2025}
115
+ }
116
+ ```
117
+
118
+ ## Related Work
119
+
120
+ - **Pi0 (official)**: [pi0 (JAX)](https://github.com/Physical-Intelligence/openpi)
121
+ - **Base Model (Pi0 HF)**: [lerobot/pi0](https://huggingface.co/lerobot/pi0)
122
+ - **Dataset**: [BridgeV2](https://bridge-v2.github.io/)
123
+ - **Framework**: [LeRobot](https://github.com/huggingface/lerobot)
124
+ - **Simpler Environment**: [SimplerEnv](https://github.com/simpler-env/SimplerEnv)
125
+ - **Open-source Pi0 Implementation by Allen Ren**: [open-pi-zero](https://github.com/allenzren/open-pi-zero)
126
+
127
+ ## License
128
+
129
+ This model is released under the Apache 2.0 license. Please see the base model's license for any additional restrictions.
130
+
131
+ ## Support
132
+
133
+ For questions about this model:
134
+ - 📧 Open an issue in this repository
135
+ - 💬 Discussion tab for community questions
136
+ - 📖 Check our [paper](https://arxiv.org/abs/2506.09930) for technical details
137
+
138
+ ---
139
+
140
+ *Last updated: June 2025*