|
--- |
|
license: mit |
|
datasets: |
|
- IPEC-COMMUNITY/libero_object_no_noops_lerobot |
|
language: |
|
- en |
|
base_model: |
|
- Hume-vla/Hume-System2 |
|
pipeline_tag: robotics |
|
library_name: transformers |
|
tags: |
|
- VLA |
|
--- |
|
# Model Card for Hume-Libero_Object |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
A Dual-System Visual-Language-Action model with System-2 thinking trained on Libero-Object. |
|
- Paper: [https://arxiv.org/abs/2505.21432](https://arxiv.org/abs/2505.21432) |
|
- Homepage: [https://hume-vla.github.io](https://hume-vla.github.io) |
|
- Codebase: [🦾 Hume: A Dual-System VLA with System2 Thinking](https://github.com/hume-vla/hume)  |
|
|
|
## Optimal TTS Args |
|
```bash |
|
s2_candidates_num=5 |
|
noise_temp_lower_bound=1.0 |
|
noise_temp_upper_bound=1.2 |
|
time_temp_lower_bound=1.0 |
|
time_temp_upper_bound=1.0 |
|
``` |
|
|
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
- If you want to reproduce the results in paper, follow the [instruction](https://github.com/hume-vla/hume/tree/main/experiments/libero) |
|
- If you want to directly use the model: |
|
```python |
|
from hume import HumePolicy |
|
import numpy as np |
|
|
|
# load policy |
|
hume = HumePolicy.from_pretrained("/path/to/checkpoints") |
|
|
|
# config Test-Time Computing args |
|
hume.init_infer( |
|
infer_cfg=dict( |
|
replan_steps=8, |
|
s2_replan_steps=16, |
|
s2_candidates_num=5, |
|
noise_temp_lower_bound=1.0, |
|
noise_temp_upper_bound=1.0, |
|
time_temp_lower_bound=0.9, |
|
time_temp_upper_bound=1.0, |
|
post_process_action=True, |
|
device="cuda", |
|
) |
|
) |
|
|
|
# prepare observations |
|
observation = { |
|
"observation.images.image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C) |
|
"observation.images.wrist_image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C) |
|
"observation.state": np.zeros((1, 7)), # (B, state_dim) |
|
"task": ["Lift the papper"], |
|
} |
|
|
|
# Infer the action |
|
action = hume.infer(observation) # (B, action_dim) |
|
``` |
|
## Training and Evaluation Details |
|
```bash |
|
# source ckpts |
|
2025-05-01/19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k/0150000 |
|
# original logs |
|
2025-06-13/00-18-26+19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k_0150000_s1-8_s2-16_s2cand-5_ntl-1.0_ntu-1.2_ttl-1.0_ttu-1.0.log |
|
``` |
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
|
|
```BibTeX |
|
@article{song2025hume, |
|
title={Hume: Introducing System-2 Thinking in Visual-Language-Action Model}, |
|
author={Anonimous Authors}, |
|
journal={arXiv preprint arXiv:2505.21432}, |
|
year={2025} |
|
} |
|
``` |