FlowerVLA - Vision-Language-Action Flow Model finetuned on LIBERO 90

This is a pretrained FlowerVLA model for robotic manipulation trained on the LIBERO 90 dataset. Flower is an efficient Vision-Language-Action Flow policy for robot learning that only contains 1B parameters.

Model Description

FlowerVLA is a novel architecture that:

Uses half of Florence-2 for multi-modal vision-language encoding
Employs an novel transformer-based flow matching architecture
Provides an efficient, versatile VLA policy with only ~1B parameters

Model Performance

This checkpoint contains weights for the LIBERO 90 challenge and achieves these results:

eval_lh/sr_KITCHEN_SCENE10_close_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE10_close_the_top_drawer_of_the_cabinet_and_put_the_black_bowl_on_top_of_it with success 0.9807692307692308 eval_lh/sr_KITCHEN_SCENE10_put_the_black_bowl_in_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE10_put_the_butter_at_the_back_in_the_top_drawer_of_the_cabinet_and_close_it with success 0.9791666666666666 eval_lh/sr_KITCHEN_SCENE10_put_the_butter_at_the_front_in_the_top_drawer_of_the_cabinet_and_close_it with success 1.0 eval_lh/sr_KITCHEN_SCENE10_put_the_chocolate_pudding_in_the_top_drawer_of_the_cabinet_and_close_it with success 1.0 eval_lh/sr_KITCHEN_SCENE1_open_the_bottom_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE1_open_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE1_open_the_top_drawer_of_the_cabinet_and_put_the_bowl_in_it with success 1.0 eval_lh/sr_KITCHEN_SCENE1_put_the_black_bowl_on_the_plate with success 1.0 eval_lh/sr_KITCHEN_SCENE1_put_the_black_bowl_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE2_open_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE2_put_the_black_bowl_at_the_back_on_the_plate with success 0.9599358974358975 eval_lh/sr_KITCHEN_SCENE2_put_the_black_bowl_at_the_front_on_the_plate with success 0.9791666666666666 eval_lh/sr_KITCHEN_SCENE2_put_the_middle_black_bowl_on_the_plate with success 0.9407051282051282 eval_lh/sr_KITCHEN_SCENE2_put_the_middle_black_bowl_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE2_stack_the_black_bowl_at_the_front_on_the_black_bowl_in_the_middle with success 0.7628205128205129 eval_lh/sr_KITCHEN_SCENE2_stack_the_middle_black_bowl_on_the_back_black_bowl with success 0.6778846153846154 eval_lh/sr_KITCHEN_SCENE3_put_the_frying_pan_on_the_stove with success 0.9391025641025641 eval_lh/sr_KITCHEN_SCENE3_put_the_moka_pot_on_the_stove with success 0.9182692307692307 eval_lh/sr_KITCHEN_SCENE3_turn_on_the_stove with success 1.0 eval_lh/sr_KITCHEN_SCENE3_turn_on_the_stove_and_put_the_frying_pan_on_it with success 0.9583333333333333 eval_lh/sr_KITCHEN_SCENE4_close_the_bottom_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE4_close_the_bottom_drawer_of_the_cabinet_and_open_the_top_drawer with success 0.592948717948718 eval_lh/sr_KITCHEN_SCENE4_put_the_black_bowl_in_the_bottom_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE4_put_the_black_bowl_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE4_put_the_wine_bottle_in_the_bottom_drawer_of_the_cabinet with success 0.7788461538461539 eval_lh/sr_KITCHEN_SCENE4_put_the_wine_bottle_on_the_wine_rack with success 0.8012820512820512 eval_lh/sr_KITCHEN_SCENE5_close_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE5_put_the_black_bowl_in_the_top_drawer_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE5_put_the_black_bowl_on_the_plate with success 1.0 eval_lh/sr_KITCHEN_SCENE5_put_the_black_bowl_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE5_put_the_ketchup_in_the_top_drawer_of_the_cabinet with success 0.8044871794871794 eval_lh/sr_KITCHEN_SCENE6_close_the_microwave with success 0.8990384615384616 eval_lh/sr_KITCHEN_SCENE6_put_the_yellow_and_white_mug_to_the_front_of_the_white_mug with success 0.9599358974358975 eval_lh/sr_KITCHEN_SCENE7_open_the_microwave with success 0.9599358974358975 eval_lh/sr_KITCHEN_SCENE7_put_the_white_bowl_on_the_plate with success 0.9182692307692307 eval_lh/sr_KITCHEN_SCENE7_put_the_white_bowl_to_the_right_of_the_plate with success 0.7612179487179487 eval_lh/sr_KITCHEN_SCENE8_put_the_right_moka_pot_on_the_stove with success 0.9583333333333333 eval_lh/sr_KITCHEN_SCENE8_turn_off_the_stove with success 0.9198717948717949 eval_lh/sr_KITCHEN_SCENE9_put_the_frying_pan_on_the_cabinet_shelf with success 1.0 eval_lh/sr_KITCHEN_SCENE9_put_the_frying_pan_on_top_of_the_cabinet with success 1.0 eval_lh/sr_KITCHEN_SCENE9_put_the_frying_pan_under_the_cabinet_shelf with success 1.0 eval_lh/sr_KITCHEN_SCENE9_put_the_white_bowl_on_top_of_the_cabinet with success 0.9375 eval_lh/sr_KITCHEN_SCENE9_turn_on_the_stove with success 1.0 eval_lh/sr_KITCHEN_SCENE9_turn_on_the_stove_and_put_the_frying_pan_on_it with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE1_pick_up_the_alphabet_soup_and_put_it_in_the_basket with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE1_pick_up_the_cream_cheese_box_and_put_it_in_the_basket with success 1.0 eval_lh/sr_LIVING_ROOM_SCENE1_pick_up_the_ketchup_and_put_it_in_the_basket with success 0.9375 eval_lh/sr_LIVING_ROOM_SCENE1_pick_up_the_tomato_sauce_and_put_it_in_the_basket with success 1.0 ... eval_lh/sr_STUDY_SCENE4_pick_up_the_book_on_the_right_and_place_it_on_the_cabinet_shelf with success 1.0 eval_lh/sr_STUDY_SCENE4_pick_up_the_book_on_the_right_and_place_it_under_the_cabinet_shelf with success 0.9407051282051282 eval_lh/avg_seq_len success rate 0.9587072730064392

See the training.log file for the full results.

Input/Output Specifications

Inputs

RGB Static Camera: (B, T, 3, H, W) tensor
RGB Gripper Camera: (B, T, 3, H, W) tensor
Language Instructions: Text strings

Outputs

Action Space: (B, T, 7) tensor representing delta EEF actions

Usage

Check out our full model implementation on Github todo and follow the instructions in the readme to test the model on one of the environments.

obs = {
    "rgb_obs": {
        "rgb_static": static_image,
        "rgb_gripper": gripper_image
    }
}
90 = {"lang_text": "pick up the blue cube"}
action = model.step(obs, 90)

Training Details

Configuration

Optimizer: AdamW
Learning Rate: 2e-5
Weight Decay: 0.05

@inproceedings{ reuss2025flower, # Add citation when available }

License

This model is released under the MIT license.

mbreuss
/

flower_libero_90