BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper ⢠2505.09568 ⢠Published 27 days ago ⢠93
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper ⢠2504.12626 ⢠Published Apr 17 ⢠51
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper ⢠2501.17161 ⢠Published Jan 28 ⢠122