RLinf
/

RLinf-OpenVLAOFT-GRPO-ManiSkill3-25ood

@@ -66,40 +66,45 @@ This openvla-oft model is trained on ``Haozhan72/Openvla-oft-SFT-libero10-trajal
 ## Full OOD Evaluation and Results
 ### Overall OOD Eval Results
-Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
-| Description	| rl4vla 	| __GRPO-openvlaoft__ |	PPO-openvlaoft | PPO-openvla | 	GRPO-openvla |
-|---------------|-----------|-----------------|----------------|-------------|---------------|
-| Avg results	| 0.7608	| 0.61484375	  | 0.6453125	   | **0.822135417** | 0.7546875     |
 ### OOD Eval on Vision
-| Description	| rl4vla 	| __GRPO-openvlaoft__ |	PPO-openvlaoft | PPO-openvla | 	GRPO-openvla |
-|---------------|-----------|-----------------|----------------|-------------|---------------|
-| vision avg	| 0.7656	| 0.846875	      | 0.80546875	   | **0.8203125**	 | 0.746875      |
-| unseen table	| 0.844	    | 0.9140625	      | 0.9453125	   | **0.95703125**	 | 0.8984375     |
-| dynamic texture (weak) | 0.833	| **0.91015625**	| 0.82421875	| 0.85546875	| 0.7890625 |
-| dynamic texture (strong)	| 0.63	| **0.7734375**	| 0.625	| 0.72265625	| 0.65625 |
-| dynamic noise (weak)	| 0.854	| 0.89453125	| **0.8984375**	| 0.87109375	| 0.796875|
-| dynamic noise (strong)	| 0.667	| **0.7421875**	| 0.734375	| 0.6953125	| 0.59375|
 ### OOD Eval on Semantic
-| Description	| rl4vla 	| __GRPO-openvlaoft__ |	PPO-openvlaoft | PPO-openvla | 	GRPO-openvla |
-|---------------|-----------|-----------------|----------------|-------------|---------------|
-| object avg	| 0.754	| 0.516113281	| 0.56640625	| **0.805664063**	| 0.744140625|
-| train setting	| 0.938	| 0.94140625	| 0.91796875	| **0.9609375**	| 0.84375|
-| unseen objects	| 0.714	| 0.8046875	| 0.77734375	| **0.81640625**	| 0.765625|
-| unseen receptacles	| 0.75	| 0.7421875	| 0.78125	| **0.8125**	| 0.734375|
-| unseen instructions	| 0.891	| 0.6796875	| 0.68359375	| **0.9453125**	| 0.890625|
-| multi-object (both seen)	| 0.75	| 0.3515625	| 0.4296875	| **0.84375**	| 0.7578125|
-| multi-object (both unseen)	| 0.578	| 0.3046875	| 0.38671875	| **0.62890625**	| 0.578125|
-| distractive receptacle	| 0.812	| 0.1875	| 0.31640625	| **0.828125**	| 0.78125|
-| multi-receptacle (both unseen)	| 0.599	| 0.1171875	| 0.23828125	| **0.609375**	| 0.6015625|
 ### OOD Eval on Position
-| Description	| rl4vla 	| __GRPO-openvlaoft__ |	PPO-openvlaoft | PPO-openvla | 	GRPO-openvla |
-|---------------|-----------|-----------------|----------------|-------------|---------------|
-| position avg	| 0.776	| 0.4296875	| 0.560546875	| **0.892578125**	| 0.81640625|
-| unseen position (object & receptacle)	| 0.807	| 0.40234375	| 0.50390625	| **0.86328125**	| 0.75|
-| mid-episode object reposition	| 0.745	| 0.45703125	| 0.6171875	| **0.921875**	| 0.8828125|
 ## How to Use
 Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvlaoft.yaml``:

 ## Full OOD Evaluation and Results
 ### Overall OOD Eval Results
+Note: rl4vla refers to the paper [VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study](https://arxiv.org/abs/2505.19789).
+| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
+|-------------|--------|---------------------|----------------|-------------|--------------|
+| Avg results | 76.08 | 61.48 | 64.53 | **82.21** | 75.47 |
 ### OOD Eval on Vision
+| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
+|-------------|--------|---------------------|----------------|-------------|--------------|
+| vision avg | 76.56 | 84.69 | 80.55 | **82.03** | 74.69 |
+| unseen table | 84.40 | 91.41 | 94.53 | **95.70** | 89.84 |
+| dynamic texture (weak) | 83.30 | **91.02** | 82.42 | 85.55 | 78.91 |
+| dynamic texture (strong) | 63.00 | **77.34** | 62.50 | 72.27 | 65.62 |
+| dynamic noise (weak) | 85.40 | 89.45 | **89.84** | 87.11 | 79.69 |
+| dynamic noise (strong) | 66.70 | **74.22** | 73.44 | 69.53 | 59.38 |
 ### OOD Eval on Semantic
+| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
+|-------------|--------|---------------------|----------------|-------------|--------------|
+| object avg | 75.40 | 51.61 | 56.64 | **80.57** | 74.41 |
+| train setting | 93.80 | 94.14 | 91.80 | **96.09** | 84.38 |
+| unseen objects | 71.40 | 80.47 | 77.73 | **81.64** | 76.56 |
+| unseen receptacles | 75.00 | 74.22 | 78.12 | **81.25** | 73.44 |
+| unseen instructions | 89.10 | 67.97 | 68.36 | **94.53** | 89.06 |
+| multi-object (both seen) | 75.00 | 35.16 | 42.97 | **84.38** | 75.78 |
+| multi-object (both unseen) | 57.80 | 30.47 | 38.67 | **62.89** | 57.81 |
+| distractive receptacle | 81.20 | 18.75 | 31.64 | **82.81** | 78.12 |
+| multi-receptacle (both unseen) | 59.90 | 11.72 | 23.83 | **60.94** | 60.16 |
 ### OOD Eval on Position
+| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
+|-------------|--------|---------------------|----------------|-------------|--------------|
+| position avg | 77.60 | 42.97 | 56.05 | **89.26** | 81.64 |
+| unseen position (object & receptacle) | 80.70 | 40.23 | 50.39 | **86.33** | 75.00 |
+| mid-episode object reposition | 74.50 | 45.70 | 61.72 | **92.19** | 88.28 |
 ## How to Use
 Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvlaoft.yaml``: