Pi-0 Bolt Nut Sort Model
This is a Pi-0 (Pi-Zero) model trained for bolt and nut sorting tasks using the OpenPI framework.
Model Description
- Architecture: Pi-0 (diffusion-based vision-language-action model)
- Base Model: PaLiGemma 3B with SigLIP vision encoder
- Task: Sorting bolts and nuts into separate baskets
- Robot: Dual-arm ALOHA setup
- Action Space: 14-DoF (7 per arm: 6 joints + 1 gripper)
- Training Steps: 29,999
- Action Horizon: 50 steps
- Image Resolution: 224x224
Dataset
Trained on the naungth/pi0_bolt_nut_sort
dataset with the task instruction:
"sort the bolts and the nuts into separate baskets"
Usage
With OpenPI
from openpi.policies import policy_config
from openpi.training import config
# Load the model configuration
config_name = "pi0_bns"
train_config = config.get_config(config_name)
# Create policy from your local checkpoint
policy = policy_config.create_trained_policy(
train_config,
"path/to/checkpoint",
default_prompt="sort the bolts and the nuts into separate baskets"
)
# Use for inference
observation = {
"images": {
"cam_high": image_array, # [H, W, 3] uint8
"cam_left_wrist": left_wrist_image, # [H, W, 3] uint8
"cam_right_wrist": right_wrist_image, # [H, W, 3] uint8
},
"state": joint_positions, # [14] float32
"prompt": "sort the bolts and the nuts into separate baskets"
}
actions = policy.infer(observation)["actions"] # [50, 14]
With Policy Server
# Start the policy server
uv run scripts/serve_policy.py policy:checkpoint \
--policy.config=pi0_bns \
--policy.dir=path/to/checkpoint
# Use with client
from openpi_client import websocket_client_policy
client = websocket_client_policy.WebsocketClientPolicy("localhost", 8000)
actions = client.infer(observation)
Model Architecture
- Vision Encoder: SigLIP-So400m/14
- Language Model: Gemma 2B + Gemma 300M (action expert)
- Training: Diffusion-based action prediction
- Input: Multi-camera RGB + proprioception + language instruction
- Output: Future action sequence (50 timesteps)
Training Details
- Framework: JAX/Flax with OpenPI
- Optimizer: AdamW
- Base Checkpoint: Pi-0 base model from Google
- Fine-tuning: Task-specific fine-tuning on bolt nut sort data
- Normalization: Dataset-specific state/action normalization
License
MIT License
Citation
If you use this model, please cite:
@article{pi0,
title={Pi-Zero: A Diffusion-Based Policy for Robot Manipulation},
author={TODO: Add authors},
year={2024}
}
Acknowledgments
- Built using the OpenPI framework
- Based on the Pi-0 architecture
- Training data from bolt nut sorting demonstrations
- Downloads last month
- 1
Model tree for naungth/pi0_dart
Base model
google/paligemma-3b-pt-224