Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published 7 days ago • 18 • 7
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published 7 days ago • 18 • 7
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Paper • 2502.20475 • Published 14 days ago • 2 • 4
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Paper • 2502.20475 • Published 14 days ago • 2
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published 7 days ago • 18 • 7
IHEval: Evaluating Language Models on Following the Instruction Hierarchy Paper • 2502.08745 • Published 29 days ago • 18
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published about 1 month ago • 30
Diverse Inference and Verification for Advanced Reasoning Paper • 2502.09955 • Published 28 days ago • 17
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 26 days ago • 142
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning Paper • 2502.11271 • Published 25 days ago • 16
Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options Paper • 2502.12929 • Published 24 days ago • 7
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading Paper • 2502.12574 • Published 24 days ago • 11
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published 24 days ago • 67
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs Paper • 2502.10454 • Published about 1 month ago • 7