I-Con: A Unifying Framework for Representation Learning Paper • 2504.16929 • Published 18 days ago • 30
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models Paper • 2504.01137 • Published Apr 1 • 21
Scaling Analysis of Interleaved Speech-Text Language Models Paper • 2504.02398 • Published Apr 3 • 28
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper • 2504.00595 • Published Apr 1 • 36
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published Mar 31 • 62
Single Image Iterative Subject-driven Generation and Editing Paper • 2503.16025 • Published Mar 20 • 14
AudioX: Diffusion Transformer for Anything-to-Audio Generation Paper • 2503.10522 • Published Mar 13 • 25
RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling Paper • 2503.09601 • Published Mar 12 • 15
Slam Collection All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets • 6 items • Updated Feb 25 • 13
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 70
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published Feb 13 • 35
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published Jan 28 • 38
Unsupervised Speech Segmentation: A General Approach Using Speech Language Models Paper • 2501.03711 • Published Jan 7 • 1
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation Paper • 2501.03059 • Published Jan 6 • 22
Continuous Speech Synthesis using per-token Latent Diffusion Paper • 2410.16048 • Published Oct 21, 2024 • 30