NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published 3 days ago • 24
Cyber-Zero: Training Cybersecurity Agents without Runtime Paper • 2508.00910 • Published 24 days ago • 8
Personalized Safety Alignment for Text-to-Image Diffusion Models Paper • 2508.01151 • Published 21 days ago • 8
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe Paper • 2508.01691 • Published 19 days ago • 9
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published 20 days ago • 13
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo Paper • 2508.02317 • Published 18 days ago • 16
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report Paper • 2508.01059 • Published 21 days ago • 32
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models Paper • 2504.05262 • Published Apr 7 • 11
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8 • 111
Generative Evaluation of Complex Reasoning in Large Language Models Paper • 2504.02810 • Published Apr 3 • 14
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Paper • 2502.20475 • Published Feb 27 • 3
IHEval: Evaluating Language Models on Following the Instruction Hierarchy Paper • 2502.08745 • Published Feb 12 • 20
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12 • 37
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 165
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning Paper • 2502.11271 • Published Feb 16 • 18