Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching Paper • 2503.05179 • Published 6 days ago • 42
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model Paper • 2502.13449 • Published 22 days ago • 42
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 25 days ago • 142
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper • 2502.13922 • Published 22 days ago • 25
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models Paper • 2502.12464 • Published 23 days ago • 27
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published 30 days ago • 47
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 28 days ago • 143 • 6
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 28 days ago • 143 • 6
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 28 days ago • 143
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 28 days ago • 143
HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning Paper • 2406.09827 • Published Jun 14, 2024 • 2
HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning Paper • 2406.09827 • Published Jun 14, 2024 • 2
view article Article Mastering Long Contexts in LLMs with KVPress By nvidia and 1 other • Jan 23 • 64
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published Jan 10 • 68
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding Paper • 2412.02186 • Published Dec 3, 2024 • 22