FlowerVLA - Vision-Language-Action Flow Model finetuned on LIBERO Object

This is a pretrained FlowerVLA model for robotic manipulation trained on the LIBERO Spatial dataset. Flower is an efficient Vision-Language-Action Flow policy for robot learning that only contains 1B parameters.

Model Description

FlowerVLA is a novel architecture that:

  • Uses half of Florence-2 for multi-modal vision-language encoding
  • Employs an novel transformer-based flow matching architecture
  • Provides an efficient, versatile VLA policy with only ~1B parameters

Model Performance

This checkpoint contains weights for the LIBERO Object challenge and achieves these results:

avg_seq_len success rate 0.9940705299377441 pick_up_the_alphabet_soup_and_place_it_in_the_basket with success 1.0 pick_up_the_cream_cheese_and_place_it_in_the_basket with success 0.9407051282051282 pick_up_the_salad_dressing_and_place_it_in_the_basket with success 1.0 pick_up_the_bbq_sauce_and_place_it_in_the_basket with success 1.0 pick_up_the_ketchup_and_place_it_in_the_basket with success 1.0 pick_up_the_tomato_sauce_and_place_it_in_the_basket with success 1.0 pick_up_the_butter_and_place_it_in_the_basket with success 1.0 pick_up_the_milk_and_place_it_in_the_basket with success 1.0 pick_up_the_chocolate_pudding_and_place_it_in_the_basket with success 1.0 pick_up_the_orange_juice_and_place_it_in_the_basket with success 1.0

Input/Output Specifications

Inputs

  • RGB Static Camera: (B, T, 3, H, W) tensor
  • RGB Gripper Camera: (B, T, 3, H, W) tensor
  • Language Instructions: Text strings

Outputs

  • Action Space: (B, T, 7) tensor representing delta EEF actions

Usage

Check out our full model implementation on Github todo and follow the instructions in the readme to test the model on one of the environments.

obs = {
    "rgb_obs": {
        "rgb_static": static_image,
        "rgb_gripper": gripper_image
    }
}
goal = {"lang_text": "pick up the blue cube"}
action = model.step(obs, goal)

Training Details

Configuration

  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Weight Decay: 0.05

@inproceedings{ reuss2025flower, # Add citation when available }

License

This model is released under the MIT license.

Downloads last month
7
Safetensors
Model size
1B params
Tensor type
F32
·
Video Preview
loading

Model tree for mbreuss/flower_libero_object

Finetuned
(20)
this model

Collection including mbreuss/flower_libero_object