pipeline_tag: robotics | |
tags: | |
- lerobot | |
SmolVLA: A vision-language-action model for affordable and efficient robotics | |
[Paper](https://huggingface.co/papers/2506.01844) | |
Designed by Hugging Face. | |
This model has 450M parameters in total. | |
You can use inside the [LeRobot library](https://github.com/huggingface/lerobot). | |
Install smolvla extra dependencies: | |
```bash | |
pip install -e ".[smolvla]" | |
``` | |
Example of finetuning the smolvla pretrained model (`smolvla_base`): | |
```bash | |
python lerobot/scripts/train.py \ | |
--policy.path=lerobot/smolvla_base \ | |
--dataset.repo_id=danaaubakirova/svla_so100_task1_v3 \ | |
--batch_size=64 \ | |
--steps=200000 | |
``` | |
Example of finetuning the smolvla neural network with pretrained VLM and action expert | |
intialized from scratch: | |
```bash | |
python lerobot/scripts/train.py \ | |
--policy.type=smolvla \ | |
--dataset.repo_id=danaaubakirova/svla_so100_task1_v3 \ | |
--batch_size=64 \ | |
--steps=200000 | |
``` | |
Example of using the smolvla pretrained model outside LeRobot training framework: | |
```python | |
policy = SmolVLAPolicy.from_pretrained("lerobot/smolvla_base") | |
``` |