File size: 5,377 Bytes
b9c622a 82628d6 b9c622a 82628d6 b9c622a 82628d6 b9c622a 82628d6 b9c622a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
---
license: mit
tags:
- RLinf
language:
- en
metrics:
- accuracy
base_model:
- Haozhan72/Openvla-oft-SFT-libero-goal-trajall
pipeline_tag: reinforcement-learning
model-index:
- name: RLinf-openvlaoft-maniskill3-grpo
results:
- task:
type: VLA
dataset:
type: maniskill-vision
name: maniskill-vision
metrics:
- type: accuracy
value: 84.6
- task:
type: VLA
dataset:
type: maniskill-semantic
name: maniskill-semantic
metrics:
- type: accuracy
value: 51.6
- task:
type: VLA
dataset:
type: maniskill-position
name: maniskill-position
metrics:
- type: accuracy
value: 42.9
---
<div align="center">
<img src="logo.svg" alt="RLinf-logo" width="500"/>
</div>
<div align="center">
<!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
<!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
<!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
<a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&"></a> -->
</div>
<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
<div align="center">
<img src="overview.png" alt="RLinf-overview" width="600"/>
</div>
## Model Description
This openvla-oft model is trained on ``Haozhan72/Openvla-oft-SFT-libero10-trajall`` with an additional lora SFT checkpoint and finetuned by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
## Full OOD Evaluation and Results
### Overall OOD Eval Results
Note: rl4vla refers to the paper [VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study](https://arxiv.org/abs/2505.19789).
| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
|-------------|--------|---------------------|----------------|-------------|--------------|
| Avg results | 76.08 | 61.48 | 64.53 | **82.21** | 75.47 |
### OOD Eval on Vision
| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
|-------------|--------|---------------------|----------------|-------------|--------------|
| vision avg | 76.56 | 84.69 | 80.55 | **82.03** | 74.69 |
| unseen table | 84.40 | 91.41 | 94.53 | **95.70** | 89.84 |
| dynamic texture (weak) | 83.30 | **91.02** | 82.42 | 85.55 | 78.91 |
| dynamic texture (strong) | 63.00 | **77.34** | 62.50 | 72.27 | 65.62 |
| dynamic noise (weak) | 85.40 | 89.45 | **89.84** | 87.11 | 79.69 |
| dynamic noise (strong) | 66.70 | **74.22** | 73.44 | 69.53 | 59.38 |
### OOD Eval on Semantic
| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
|-------------|--------|---------------------|----------------|-------------|--------------|
| object avg | 75.40 | 51.61 | 56.64 | **80.57** | 74.41 |
| train setting | 93.80 | 94.14 | 91.80 | **96.09** | 84.38 |
| unseen objects | 71.40 | 80.47 | 77.73 | **81.64** | 76.56 |
| unseen receptacles | 75.00 | 74.22 | 78.12 | **81.25** | 73.44 |
| unseen instructions | 89.10 | 67.97 | 68.36 | **94.53** | 89.06 |
| multi-object (both seen) | 75.00 | 35.16 | 42.97 | **84.38** | 75.78 |
| multi-object (both unseen) | 57.80 | 30.47 | 38.67 | **62.89** | 57.81 |
| distractive receptacle | 81.20 | 18.75 | 31.64 | **82.81** | 78.12 |
| multi-receptacle (both unseen) | 59.90 | 11.72 | 23.83 | **60.94** | 60.16 |
### OOD Eval on Position
| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
|-------------|--------|---------------------|----------------|-------------|--------------|
| position avg | 77.60 | 42.97 | 56.05 | **89.26** | 81.64 |
| unseen position (object & receptacle) | 80.70 | 40.23 | 50.39 | **86.33** | 75.00 |
| mid-episode object reposition | 74.50 | 45.70 | 61.72 | **92.19** | 88.28 |
## How to Use
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvlaoft.yaml``:
- Set ``actor.checkpoint_load_path``, ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.
Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.
## License
This code repository and the model weights are licensed under the MIT License.
|