FlowerVLA - Vision-Language-Action Flow Model finetuned on LIBERO 10

This is a pretrained FlowerVLA model for robotic manipulation trained on the LIBERO 10 dataset. Flower is an efficient Vision-Language-Action Flow policy for robot learning that only contains 1B parameters.

Model Description

FlowerVLA is a novel architecture that:

  • Uses half of Florence-2 for multi-modal vision-language encoding
  • Employs an novel transformer-based flow matching architecture
  • Provides an efficient, versatile VLA policy with only ~1B parameters

Model Performance

This checkpoint contains weights for the LIBERO 10 challenge and achieves these results:

eval_lh/avg_seq_len success rate 0.9440705180168152 eval_lh/sr_LIVING_ROOM_SCENE2_put_both_the_alphabet_soup_and_the_tomato_sauce_in_the_basket with success 0.9791666666666666 eval_lh/sr_LIVING_ROOM_SCENE2_put_both_the_cream_cheese_box_and_the_butter_in_the_basket with success 1.0 eval_lh/sr_KITCHEN_SCENE3_turn_on_the_stove_and_put_the_moka_pot_on_it with success 0.9791666666666666 eval_lh/sr_KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet_and_close_it with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE5_put_the_white_mug_on_the_left_plate_and_put_the_yellow_and_white_mug_on_the_right_plate with success 0.9407051282051282 eval_lh/sr_STUDY_SCENE1_pick_up_the_book_and_place_it_in_the_back_compartment_of_the_caddy with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE6_put_the_white_mug_on_the_plate_and_put_the_chocolate_pudding_to_the_right_of_the_plate with success 0.8990384615384616 eval_lh/sr_LIVING_ROOM_SCENE1_put_both_the_alphabet_soup_and_the_cream_cheese_box_in_the_basket with success 1.0 eval_lh/sr_KITCHEN_SCENE8_put_both_moka_pots_on_the_stove with success 0.7403846153846154 eval_lh/sr_KITCHEN_SCENE6_put_the_yellow_and_white_mug_in_the_microwave_and_close_it with success 0.9022435897435898

Input/Output Specifications

Inputs

  • RGB Static Camera: (B, T, 3, H, W) tensor
  • RGB Gripper Camera: (B, T, 3, H, W) tensor
  • Language Instructions: Text strings

Outputs

  • Action Space: (B, T, 7) tensor representing delta EEF actions

Usage

Check out our full model implementation on Github todo and follow the instructions in the readme to test the model on one of the environments.

obs = {
    "rgb_obs": {
        "rgb_static": static_image,
        "rgb_gripper": gripper_image
    }
}
10 = {"lang_text": "pick up the blue cube"}
action = model.step(obs, 10)

Training Details

Configuration

  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Weight Decay: 0.05

@inproceedings{ reuss2025flower, # Add citation when available }

License

This model is released under the MIT license.

Downloads last month
6
Safetensors
Model size
1B params
Tensor type
F32
·
Video Preview
loading

Model tree for mbreuss/flower_libero_10

Finetuned
(20)
this model

Collection including mbreuss/flower_libero_10