SmolVLA: A vision-language-action model for affordable and efficient robotics

Paper

Code

Designed by Hugging Face.

This model has 450M parameters in total. You can use inside the LeRobot library.

Install smolvla extra dependencies:

pip install -e ".[smolvla]"

Example of finetuning the smolvla pretrained model (smolvla_base):

python lerobot/scripts/train.py \
--policy.path=lerobot/smolvla_base \
--dataset.repo_id=danaaubakirova/svla_so100_task1_v3 \
--batch_size=64 \
--steps=200000

Example of finetuning the smolvla neural network with pretrained VLM and action expert intialized from scratch:

python lerobot/scripts/train.py \
--policy.type=smolvla \
--dataset.repo_id=danaaubakirova/svla_so100_task1_v3 \
--batch_size=64 \
--steps=200000

Example of using the smolvla pretrained model outside LeRobot training framework:

policy = SmolVLAPolicy.from_pretrained("lerobot/smolvla_base")
Downloads last month
1,069
Safetensors
Model size
450M params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for lerobot/smolvla_base

Finetunes
1 model

Collection including lerobot/smolvla_base