STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis Paper • 2506.06276 • Published 4 days ago • 18
Leveraging Self-Attention for Input-Dependent Soft Prompting in LLMs Paper • 2506.05629 • Published 5 days ago • 32
TinySwallow Collection Compact Japanese models trained with "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models" • 5 items • Updated Jan 30 • 17
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Paper • 2506.05349 • Published 5 days ago • 24
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper • 2506.01111 • Published 9 days ago • 29
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design Paper • 2506.04734 • Published 6 days ago • 18
A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published 8 days ago • 31
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers Paper • 2506.03065 • Published 7 days ago • 27
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Paper • 2506.01674 • Published 9 days ago • 26
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks Paper • 2504.19854 • Published Apr 28 • 7
LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks Paper • 2506.00411 • Published 11 days ago • 30
Taming LLMs by Scaling Learning Rates with Gradient Grouping Paper • 2506.01049 • Published 9 days ago • 36
Holo1 Collection Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated about 4 hours ago • 43
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H By Hcompany and 1 other • 8 days ago • 62
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated May 1 • 569
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published 11 days ago • 90
Hymba: A Hybrid-head Architecture for Small Language Models Paper • 2411.13676 • Published Nov 20, 2024 • 46