Update README.md

32feb5f verified 2 months ago

1.93 kB

metadata

base_model:
  - mair-lab/thinking-sft-simple

EARL - RL Fine-tuned (S + C) thinking (8B)

Model Name: mair-lab/earl-thinking-sft-simple.rl-simple-n-complex
Model Size: 8B parameters
Base Checkpoint: mair-lab/sft-think-simple
Training Method: Supervised Fine-Tuning (SFT think (S)) → Reinforcement Learning (RL) on Simple + Complex Edits
Datasets: Simple Edit (S), Complex Edit (C)

This model is part of the EARL benchmark study:
📄 EARL: The Promise of RL for Autoregressive Image Editing

Model Summary

This model starts from the reasoning-based SFT checkpoint (sft-think-simple) and is optimized using reinforcement learning across both simple and complex edit instructions. While it incorporates chain-of-thought supervision, it still trails the non-reasoning RL model in overall benchmark performance.

➡️ Inference instructions: GitHub Repo

Benchmark Results

Model Description	OmniEdit	EmuEdit	AURORA	MB	VisMin	I2EBench	AVG
SFT think (S)	4.34	3.76	2.88	3.36	3.46	3.21	3.50
EARL SFT think (S) → RL (S+C)	4.65	3.78	3.23	3.67	3.39	3.36	3.68

📉 Note: The RL version improves modestly over the SFT think (S) baseline but does not match the performance of the non-reasoning SFT (S) RL model, indicating current limitations of reasoning-guided supervision in image editing.

Use Cases

Research on visual reasoning in RL
Multi-instruction and compositional image editing
Comparative analysis between reasoning and non-reasoning RL approaches