RLinf: Reinforcement Learning Infrastructure for Agentic AI
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.

Model Description
The RLinf-openvlaoft-libero series is trained on Haozhan72/Openvla-oft-SFT-libero-xxx-traj1 (including libero10, libero-object, libero-goal and libero-spatial), using the same base models and training datasets as verl. Training with RLinf yields SOTA performance.
We use a mask to focus on valid action tokens, and compute token-level loss based on the Group Relative Policy Optimization (GRPO) advantage function, in order to enhance the model’s performance on spatial reasoning, object generalization, instruction generalization, and long-horizon tasks.
Evaluation and Results
We trained and evaluated four models using RLinf:
RLinf-openvlaoft-libero-object Model (based on Haozhan72/Openvla-oft-SFT-libero-object-traj1)
- Recommended sampling settings:
temperature = 1.6
,top_p = 1.0
- Recommended sampling settings:
RLinf-openvlaoft-libero-spatial Model (based on Haozhan72/Openvla-oft-SFT-libero-spatial-traj1)
- Recommended sampling settings:
temperature = 1.6
,top_p = 1.0
- Recommended sampling settings:
RLinf-openvlaoft-libero-goal Model (based on Haozhan72/Openvla-oft-SFT-libero-goal-traj1)
- Recommended sampling settings:
temperature = 1.6
,top_p = 1.0
- Recommended sampling settings:
RLinf-openvlaoft-libero10 Model (based on Haozhan72/Openvla-oft-SFT-libero10-traj1)
- Recommended sampling settings:
temperature = 1.6
,top_p = 1.0
- Recommended sampling settings:
Benchmark Results
All sft models are from SimpleVLA-RL.
- Recommended sampleing setting for evaluation:
libero seed=0
;episode number=500
;do_sample=False
Model | Object | Spatial | Goal | Long | Average |
---|---|---|---|---|---|
sft models | 25.60 | 56.45 | 45.59 | 9.68 | 34.33 |
trained with RLinf | 98.99 | 98.99 | 98.99 | 94.35 | 97.83 |
How to Use
Please integrate the provided model with the RLinf codebase. To do so, modify the following parameters in the configuration file examples/embodiment/config/libero_10_grpo_openvlaoft.yaml
:
- Set
actor.checkpoint_load_path
,actor.tokenizer.tokenizer_model
, androllout.model_dir
to the path of the model checkpoint.
Note: If you intend to evaluate the model directly, make sure to set actor.model.is_lora
to false
.
License
This code repository and the model weights are licensed under the MIT License.
- Downloads last month
- 1
Model tree for RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-long
Base model
Haozhan72/Openvla-oft-SFT-libero10-traj1