metadata
license: mit
datasets:
- IPEC-COMMUNITY/libero_object_no_noops_lerobot
language:
- en
base_model:
- Hume-vla/Hume-System2
pipeline_tag: robotics
library_name: transformers
tags:
- VLA
Model Card for Hume-Libero_Object
A Dual-System Visual-Language-Action model with System-2 thinking trained on Libero-Object.
- Paper: https://arxiv.org/abs/2505.21432
- Homepage: https://hume-vla.github.io
- Codebase: 🦾 Hume: A Dual-System VLA with System2 Thinking
Optimal TTS Args
s2_candidates_num=5
noise_temp_lower_bound=1.0
noise_temp_upper_bound=1.2
time_temp_lower_bound=1.0
time_temp_upper_bound=1.0
Uses
- If you want to reproduce the results in paper, follow the instruction
- If you want to directly use the model:
from hume import HumePolicy import numpy as np # load policy hume = HumePolicy.from_pretrained("/path/to/checkpoints") # config Test-Time Computing args hume.init_infer( infer_cfg=dict( replan_steps=8, s2_replan_steps=16, s2_candidates_num=5, noise_temp_lower_bound=1.0, noise_temp_upper_bound=1.0, time_temp_lower_bound=0.9, time_temp_upper_bound=1.0, post_process_action=True, device="cuda", ) ) # prepare observations observation = { "observation.images.image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C) "observation.images.wrist_image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C) "observation.state": np.zeros((1, 7)), # (B, state_dim) "task": ["Lift the papper"], } # Infer the action action = hume.infer(observation) # (B, action_dim)
Training and Evaluation Details
# source ckpts
2025-05-01/19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k/0150000
# original logs
2025-06-13/00-18-26+19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k_0150000_s1-8_s2-16_s2cand-5_ntl-1.0_ntu-1.2_ttl-1.0_ttu-1.0.log
Citation
@article{song2025hume,
title={Hume: Introducing System-2 Thinking in Visual-Language-Action Model},
author={Anonimous Authors},
journal={arXiv preprint arXiv:2505.21432},
year={2025}
}