File size: 2,837 Bytes
56837e2 92018af 56837e2 0c3e05f 56837e2 0c3e05f 56837e2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
license: mit
datasets:
- IPEC-COMMUNITY/libero_object_no_noops_lerobot
language:
- en
base_model:
- Hume-vla/Hume-System2
pipeline_tag: robotics
library_name: transformers
tags:
- VLA
---
# Model Card for Hume-Libero_Object
<!-- Provide a quick summary of what the model is/does. -->
A Dual-System Visual-Language-Action model with System-2 thinking trained on Libero-Object.
- Paper: [https://arxiv.org/abs/2505.21432](https://arxiv.org/abs/2505.21432)
- Homepage: [https://hume-vla.github.io](https://hume-vla.github.io)
- Codebase: [🦾 Hume: A Dual-System VLA with System2 Thinking](https://github.com/hume-vla/hume) 
## Optimal TTS Args
```bash
s2_candidates_num=5
noise_temp_lower_bound=1.0
noise_temp_upper_bound=1.2
time_temp_lower_bound=1.0
time_temp_upper_bound=1.0
```
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
- If you want to reproduce the results in paper, follow the [instruction](https://github.com/hume-vla/hume/tree/main/experiments/libero)
- If you want to directly use the model:
```python
from hume import HumePolicy
import numpy as np
# load policy
hume = HumePolicy.from_pretrained("/path/to/checkpoints")
# config Test-Time Computing args
hume.init_infer(
infer_cfg=dict(
replan_steps=8,
s2_replan_steps=16,
s2_candidates_num=5,
noise_temp_lower_bound=1.0,
noise_temp_upper_bound=1.0,
time_temp_lower_bound=0.9,
time_temp_upper_bound=1.0,
post_process_action=True,
device="cuda",
)
)
# prepare observations
observation = {
"observation.images.image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C)
"observation.images.wrist_image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C)
"observation.state": np.zeros((1, 7)), # (B, state_dim)
"task": ["Lift the papper"],
}
# Infer the action
action = hume.infer(observation) # (B, action_dim)
```
## Training and Evaluation Details
```bash
# source ckpts
2025-05-01/19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k/0150000
# original logs
2025-06-13/00-18-26+19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k_0150000_s1-8_s2-16_s2cand-5_ntl-1.0_ntu-1.2_ttl-1.0_ttu-1.0.log
```
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
```BibTeX
@article{song2025hume,
title={Hume: Introducing System-2 Thinking in Visual-Language-Action Model},
author={Anonimous Authors},
journal={arXiv preprint arXiv:2505.21432},
year={2025}
}
``` |