SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 1 day ago • 36
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published 1 day ago • 18
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation Paper • 2501.16764 • Published 2 days ago • 13
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published 2 days ago • 12
iFormer: Integrating ConvNet and Transformer for Mobile Application Paper • 2501.15369 • Published 4 days ago • 9
Are Vision Language Models Texture or Shape Biased and Can We Steer Them? Paper • 2403.09193 • Published Mar 14, 2024 • 8
CodeMonkeys: Scaling Test-Time Compute for Software Engineering Paper • 2501.14723 • Published 6 days ago • 6
Return of the Encoder: Maximizing Parameter Efficiency for SLMs Paper • 2501.16273 • Published 3 days ago • 3
OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas Paper • 2501.15427 • Published 4 days ago • 4
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models Paper • 2501.12370 • Published 9 days ago • 8
Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation Paper • 2501.15907 • Published 3 days ago • 14
Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 3 days ago • 19
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing Paper • 2501.13925 • Published 7 days ago • 5