Hume-vla
/

Libero-Object-1

Model card Files Files and versions

Libero-Object-1 / README.md

Hume-vla's picture

Update README.md

0c3e05f verified 3 months ago

|

history blame contribute delete

2.84 kB

	---
	license: mit
	datasets:
	- IPEC-COMMUNITY/libero_object_no_noops_lerobot
	language:
	- en
	base_model:
	- Hume-vla/Hume-System2
	pipeline_tag: robotics
	library_name: transformers
	tags:
	- VLA
	---
	# Model Card for Hume-Libero_Object


	<!-- Provide a quick summary of what the model is/does. -->
	A Dual-System Visual-Language-Action model with System-2 thinking trained on Libero-Object.
	- Paper: [https://arxiv.org/abs/2505.21432](https://arxiv.org/abs/2505.21432)
	- Homepage: [https://hume-vla.github.io](https://hume-vla.github.io)
	- Codebase: [🦾 Hume: A Dual-System VLA with System2 Thinking](https://github.com/hume-vla/hume) ![GitHub Repo stars](https://img.shields.io/github/stars/hume-vla/hume)

	## Optimal TTS Args
	```bash
	s2_candidates_num=5
	noise_temp_lower_bound=1.0
	noise_temp_upper_bound=1.2
	time_temp_lower_bound=1.0
	time_temp_upper_bound=1.0
	```


	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
	- If you want to reproduce the results in paper, follow the [instruction](https://github.com/hume-vla/hume/tree/main/experiments/libero)
	- If you want to directly use the model:
	```python
	from hume import HumePolicy
	import numpy as np

	# load policy
	hume = HumePolicy.from_pretrained("/path/to/checkpoints")

	# config Test-Time Computing args
	hume.init_infer(
	infer_cfg=dict(
	replan_steps=8,
	s2_replan_steps=16,
	s2_candidates_num=5,
	noise_temp_lower_bound=1.0,
	noise_temp_upper_bound=1.0,
	time_temp_lower_bound=0.9,
	time_temp_upper_bound=1.0,
	post_process_action=True,
	device="cuda",
	)
	)

	# prepare observations
	observation = {
	"observation.images.image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C)
	"observation.images.wrist_image": np.zeros((1,224,224,3), dtype = np.uint8), # (B, H, W, C)
	"observation.state": np.zeros((1, 7)), # (B, state_dim)
	"task": ["Lift the papper"],
	}

	# Infer the action
	action = hume.infer(observation) # (B, action_dim)
	```
	## Training and Evaluation Details
	```bash
	# source ckpts
	2025-05-01/19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k/0150000
	# original logs
	2025-06-13/00-18-26+19-56-05_libero_object_ck8-16-1_sh-4_gpu8_lr5e-5_1e-5_1e-5_2e-5_bs16_s1600k_0150000_s1-8_s2-16_s2cand-5_ntl-1.0_ntu-1.2_ttl-1.0_ttu-1.0.log
	```

	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->


	```BibTeX
	@article{song2025hume,
	title={Hume: Introducing System-2 Thinking in Visual-Language-Action Model},
	author={Anonimous Authors},
	journal={arXiv preprint arXiv:2505.21432},
	year={2025}
	}
	```