DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 8 days ago • 265
CodeRAG-Bench: Can Retrieval Augment Code Generation? Paper • 2406.14497 • Published Jun 20, 2024 • 1
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 50
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 42
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 42
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper • 2411.13476 • Published Nov 20, 2024 • 15
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Paper • 2407.10956 • Published Jul 15, 2024 • 6
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 114
Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases Paper • 2011.07743 • Published Nov 16, 2020
KoLA: Carefully Benchmarking World Knowledge of Large Language Models Paper • 2306.09296 • Published Jun 15, 2023 • 19
A Systematic Investigation of KB-Text Embedding Alignment at Scale Paper • 2106.01586 • Published Jun 3, 2021
Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs Paper • 2401.00608 • Published Dec 31, 2023 • 1
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments Paper • 2402.14672 • Published Feb 22, 2024 • 1
Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments Paper • 2212.09736 • Published Dec 19, 2022
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models Paper • 2405.14831 • Published May 23, 2024 • 3
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents Paper • 2408.06327 • Published Aug 12, 2024 • 16
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Paper • 2410.07137 • Published Oct 9, 2024 • 7
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 61