datasets: - IPEC-COMMUNITY/bridge_orig_lerobot base_model: - nvidia/GR00T-N1.5-3B
A fine-tuned GR00T model on the Bridge dataset(30k steps, 8 A100 GPUs) follows the default fine-tuning settings (i.e., freezing the VLM backbone).
The evaluation was conducted using the SimplerEnv-OpenVLA repository (https://github.com/DelinQu/SimplerEnv-OpenVLA), with thanks to their contributions to the community.
This fine-tuned model should not be considered representative of the GR00T's actual performance.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
put_spoon_on_tablecloth/matching_partial | 0.8333333333333334 | nan | nan | 0.167 | nan | 0.347 | 0.778 | nan | 0.041 | 0.375 |
put_spoon_on_tablecloth/matching_entire | 0.625 | nan | nan | 0.0 | nan | 0.125 | 0.472 | nan | 0.0 | 0.208 |
put_carrot_on_plate/matching_partial | 0.5416666666666666 | nan | nan | 0.208 | nan | 0.528 | 0.278 | nan | 0.333 | 0.333 |
put_carrot_on_plate/matching_entire | 0.4583333333333333 | nan | nan | 0.042 | nan | 0.083 | 0.097 | nan | 0.0 | 0.25 |
stack_green_block_on_yellow_block/matching_partial | 0.7083333333333334 | nan | nan | 0.083 | nan | 0.319 | 0.403 | nan | 0.125 | 0.083 |
stack_green_block_on_yellow_block/matching_entire | 0.16666666666666666 | nan | nan | 0.0 | nan | 0.0 | 0.042 | nan | 0.0 | 0.083 |
put_eggplant_in_basket/matching_partial | 0.4166666666666667 | nan | nan | 0.0 | nan | 0.667 | 0.875 | nan | 0.083 | 0.0 |
put_eggplant_in_basket/matching_entire | 0.20833333333333334 | nan | nan | 0.0 | nan | 0.431 | 0.569 | nan | 0.041 | 0.0 |
ckpt_name | GR00T-N1.5 | RT-1(Converged) | RT-1(15%) | RT-1-X | RT-2-X | Octo-Base | Octo-Small | RT-1(begin) | OpenVLA | RoboVLM |
Data configuration:
In addition to adding the following code to data_config.py
, I also provide the modality.json
, which is required for the GR00T dataloader.
class FractalDataConfig(So100DataConfig):
video_keys = ["video.image", ]
state_keys = ["state.x", "state.y", "state.z", "state.rx", "state.ry", "state.rz", "state.rw", "state.gripper"]
action_keys = ["action.x", "action.y", "action.z", "action.roll", "action.pitch", "action.yaw", "action.gripper"]
language_keys = ["annotation.human.action.task_description"]
def transform(self) -> ModalityTransform:
transforms = [
# video transforms
VideoToTensor(apply_to=self.video_keys),
VideoCrop(apply_to=self.video_keys, scale=0.95),
VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
VideoColorJitter(
apply_to=self.video_keys,
brightness=0.3,
contrast=0.4,
saturation=0.5,
hue=0.08,
),
VideoToNumpy(apply_to=self.video_keys),
# state transforms
StateActionToTensor(apply_to=self.state_keys),
StateActionTransform(
apply_to=self.state_keys,
normalization_modes={key: "min_max" for key in self.state_keys},
),
# action transforms
StateActionToTensor(apply_to=self.action_keys),
StateActionTransform(
apply_to=self.action_keys,
normalization_modes={key: "min_max" for key in self.action_keys},
),
# concat transforms
ConcatTransform(
video_concat_order=self.video_keys,
state_concat_order=self.state_keys,
action_concat_order=self.action_keys,
),
# model-specific transform
GR00TTransform(
state_horizon=len(self.observation_indices),
action_horizon=len(self.action_indices),
max_state_dim=64,
max_action_dim=32,
),
]
return ComposedModalityTransform(transforms=transforms)
class BridgeDataConfig(FractalDataConfig):
video_keys = ["video.image_0", ]
state_keys = ["state.x", "state.y", "state.z", "state.roll", "state.pitch", "state.yaw", "state.pad", "state.gripper"]
action_keys = ["action.x", "action.y", "action.z", "action.roll", "action.pitch", "action.yaw", "action.gripper"]
language_keys = ["annotation.human.action.task_description"]
Extra embodiment tag to reproduce the results.
class EmbodimentTag(Enum):
OXE = 'oxe'
# Embodiment tag string: to projector index in the Action Expert Module
EMBODIMENT_TAG_MAPPING = {
EmbodimentTag.OXE.value: 7,
}
Thanks to @youliangtan, who reevaluated my results.
https://github.com/NVIDIA/Isaac-GR00T : with the commit hash aa6441feb4f08233d55cbfd2082753cdc01fa676
With the modified SimplerEnv : https://github.com/youliangtan/SimplerEnv
- Downloads last month
- 19
Model tree for ShuaiYang03/GR00T-N1.5-Lerobot-SimplerEnv-BridgeV2
Base model
nvidia/GR00T-N1.5-3B