WinstonWmj0512's picture
Update README.md
f7a1492 verified
metadata
license: mit
tags:
  - RLinf
language:
  - en
metrics:
  - accuracy
base_model:
  - Haozhan72/Openvla-oft-SFT-libero10-traj1
pipeline_tag: reinforcement-learning
model-index:
  - name: RLinf-OpenVLAOFT-GRPO-LIBERO-10
    results:
      - task:
          type: VLA
        dataset:
          type: libero_10
          name: libero_10
        metrics:
          - type: accuracy
            value: 94.35
RLinf-logo

RLinf: Reinforcement Learning Infrastructure for Agentic AI

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.

RLinf-overview

Model Description

The RLinf-openvlaoft-libero series is trained on Haozhan72/Openvla-oft-SFT-libero-xxx-traj1 (including libero10, libero-object, libero-goal and libero-spatial), using the same base models and training datasets as verl. Training with RLinf yields SOTA performance.

We use a mask to focus on valid action tokens, and compute token-level loss based on the Group Relative Policy Optimization (GRPO) advantage function, in order to enhance the model’s performance on spatial reasoning, object generalization, instruction generalization, and long-horizon tasks.

Evaluation and Results

We trained and evaluated four models using RLinf:

Benchmark Results

All sft models are from SimpleVLA-RL.

  • Recommended sampleing setting for evaluation: libero seed=0; episode number=500; do_sample=False
Model Object Spatial Goal Long Average
sft models 25.60 56.45 45.59 9.68 34.33
trained with RLinf 98.99 98.99 98.99 94.35 97.83
RLinf-libero-result

How to Use

Please integrate the provided model with the RLinf codebase. To do so, modify the following parameters in the configuration file examples/embodiment/config/libero_10_grpo_openvlaoft.yaml:

  • Set actor.checkpoint_load_path, actor.tokenizer.tokenizer_model, and rollout.model_dir to the path of the model checkpoint.

Note: If you intend to evaluate the model directly, make sure to set actor.model.is_lora to false.

License

This code repository and the model weights are licensed under the MIT License.