rabiulawal's picture
Update README.md
32feb5f verified
metadata
base_model:
  - mair-lab/thinking-sft-simple

EARL - RL Fine-tuned (S + C) thinking (8B)

Model Name: mair-lab/earl-thinking-sft-simple.rl-simple-n-complex
Model Size: 8B parameters
Base Checkpoint: mair-lab/sft-think-simple
Training Method: Supervised Fine-Tuning (SFT think (S)) โ†’ Reinforcement Learning (RL) on Simple + Complex Edits
Datasets: Simple Edit (S), Complex Edit (C)

This model is part of the EARL benchmark study:
๐Ÿ“„ EARL: The Promise of RL for Autoregressive Image Editing

Model Summary

This model starts from the reasoning-based SFT checkpoint (sft-think-simple) and is optimized using reinforcement learning across both simple and complex edit instructions. While it incorporates chain-of-thought supervision, it still trails the non-reasoning RL model in overall benchmark performance.

โžก๏ธ Inference instructions: GitHub Repo

Benchmark Results

Model Description OmniEdit EmuEdit AURORA MB VisMin I2EBench AVG
SFT think (S) 4.34 3.76 2.88 3.36 3.46 3.21 3.50
EARL SFT think (S) โ†’ RL (S+C) 4.65 3.78 3.23 3.67 3.39 3.36 3.68

๐Ÿ“‰ Note: The RL version improves modestly over the SFT think (S) baseline but does not match the performance of the non-reasoning SFT (S) RL model, indicating current limitations of reasoning-guided supervision in image editing.

Use Cases

  • Research on visual reasoning in RL
  • Multi-instruction and compositional image editing
  • Comparative analysis between reasoning and non-reasoning RL approaches