Robotics
Transformers
Safetensors
English
VLA
File size: 2,837 Bytes
56837e2
 
 
 
 
 
 
92018af
56837e2
 
 
 
 
 
 
 
 
 
 
 
 
 
0c3e05f
 
 
 
 
 
 
 
 
56837e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0c3e05f
 
 
 
 
 
 
56837e2
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: mit
datasets:
- IPEC-COMMUNITY/libero_object_no_noops_lerobot
language:
- en
base_model:
- Hume-vla/Hume-System2
pipeline_tag: robotics
library_name: transformers
tags:
- VLA
---
# Model Card for Hume-Libero_Object


<!-- Provide a quick summary of what the model is/does. -->
A Dual-System Visual-Language-Action model with System-2 thinking trained on Libero-Object.
- Paper: [https://arxiv.org/abs/2505.21432](https://arxiv.org/abs/2505.21432)
- Homepage: [https://hume-vla.github.io](https://hume-vla.github.io)
- Codebase: [🦾 Hume: A Dual-System VLA with System2 Thinking](https://github.com/hume-vla/hume) ![GitHub Repo stars](https://img.shields.io/github/stars/hume-vla/hume)

## Optimal TTS Args
```bash
s2_candidates_num=5
noise_temp_lower_bound=1.0
noise_temp_upper_bound=1.2
time_temp_lower_bound=1.0
time_temp_upper_bound=1.0
```


## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
- If you want to reproduce the results in paper, follow the [instruction](https://github.com/hume-vla/hume/tree/main/experiments/libero)
- If you want to directly use the model:
  ```python
  from hume import HumePolicy
  import numpy as np
  
  # load policy
  hume = HumePolicy.from_pretrained("/path/to/checkpoints")
  
  # config Test-Time Computing args
  hume.init_infer(
      infer_cfg=dict(
          replan_steps=8,
          s2_replan_steps=16,
          s2_candidates_num=5,
          noise_temp_lower_bound=1.0,
          noise_temp_upper_bound=1.0,
          time_temp_lower_bound=0.9,
          time_temp_upper_bound=1.0,
          post_process_action=True,
          device="cuda",
      )
  )
  
  # prepare observations
  observation = {
      "observation.images.image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C)
      "observation.images.wrist_image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C)
      "observation.state": np.zeros((1, 7)), # (B, state_dim)
      "task": ["Lift the papper"],
  }
  
  # Infer the action
  action = hume.infer(observation) # (B, action_dim)
  ```
## Training and Evaluation Details
```bash
# source ckpts
2025-05-01/19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k/0150000
# original logs
2025-06-13/00-18-26+19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k_0150000_s1-8_s2-16_s2cand-5_ntl-1.0_ntu-1.2_ttl-1.0_ttu-1.0.log
```

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->


```BibTeX
@article{song2025hume,
  title={Hume: Introducing System-2 Thinking in Visual-Language-Action Model},
  author={Anonimous Authors},
  journal={arXiv preprint arXiv:2505.21432},
  year={2025}
}
```