One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation Paper • 2503.13358 • Published 7 days ago • 86
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper • 2503.16257 • Published 4 days ago • 21
MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 8 items • Updated 3 days ago • 14
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ Paper • 2405.15306 • Published May 24, 2024 • 5
DeTikZify Collection Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ • 12 items • Updated 5 days ago • 20
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 4 days ago • 38
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning Paper • 2503.16252 • Published 4 days ago • 23
Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts Paper • 2503.16057 • Published 4 days ago • 12
Manify: A Python Library for Learning Non-Euclidean Representations Paper • 2503.09576 • Published 12 days ago • 1
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 6 days ago • 100
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published 10 days ago • 72
view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 51
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search Paper • 2503.10582 • Published 11 days ago • 20