WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning Paper • 2411.02337 • Published Nov 4, 2024 • 35
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published Nov 7, 2024 • 51
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level Paper • 2411.03562 • Published Nov 5, 2024 • 66
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization Paper • 2410.08815 • Published Oct 11, 2024 • 48
Game-theoretic LLM: Agent Workflow for Negotiation Games Paper • 2411.05990 • Published Nov 8, 2024 • 8
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices Paper • 2411.10640 • Published Nov 16, 2024 • 45
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 17
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper • 2412.04862 • Published Dec 6, 2024 • 51
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published Nov 29, 2024 • 58
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation Paper • 2412.02592 • Published Dec 3, 2024 • 22
RL Zero: Zero-Shot Language to Behaviors without any Supervision Paper • 2412.05718 • Published Dec 7, 2024 • 5
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation Paper • 2412.10704 • Published Dec 14, 2024 • 15
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment Paper • 2412.13746 • Published Dec 18, 2024 • 9
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Paper • 2412.11834 • Published Dec 16, 2024 • 7
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents Paper • 2412.13194 • Published Dec 17, 2024 • 12
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper • 2412.14711 • Published Dec 19, 2024 • 16
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Paper • 2412.15797 • Published Dec 20, 2024 • 18
Progressive Multimodal Reasoning via Active Retrieval Paper • 2412.14835 • Published Dec 19, 2024 • 73
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design Paper • 2412.14590 • Published Dec 19, 2024 • 14
ericsonwillians/distilbert-base-uncased-steam-sentiment Text Classification • Updated Dec 12, 2024 • 16
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 37
Personalized Graph-Based Retrieval for Large Language Models Paper • 2501.02157 • Published Jan 4 • 29
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 97
Multi-task retriever fine-tuning for domain-specific and efficient RAG Paper • 2501.04652 • Published Jan 8 • 10
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation Paper • 2501.02576 • Published Jan 5 • 15
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published Jan 4 • 92
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Paper • 2501.03226 • Published Jan 6 • 41
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published Jan 16 • 37
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation Paper • 2501.08617 • Published Jan 15 • 10
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 92
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published Jan 11 • 9
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 50
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published Jan 18 • 15
Control LLM: Controlled Evolution for Intelligence Retention in LLM Paper • 2501.10979 • Published Jan 19 • 6
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective Paper • 2501.11110 • Published Jan 19 • 2
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning Paper • 2412.09078 • Published Dec 12, 2024
LLM2: Let Large Language Models Harness System 2 Reasoning Paper • 2412.20372 • Published Dec 29, 2024
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-Reflection Paper • 2412.08024 • Published Dec 11, 2024 • 1
Table as Thought: Exploring Structured Thoughts in LLM Reasoning Paper • 2501.02152 • Published Jan 4
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 341
Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models Paper • 2501.18119 • Published Jan 30 • 25
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published Feb 3 • 39
The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published Feb 3 • 112
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles Paper • 2502.01081 • Published Feb 3 • 14
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging Paper • 2502.05664 • Published 29 days ago • 23
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning Paper • 2502.06060 • Published 28 days ago • 34
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published 27 days ago • 60
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 27 days ago • 142
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference Paper • 2502.04416 • Published Feb 6 • 12
Goku: Flow Based Video Generative Foundation Models Paper • 2502.04896 • Published about 1 month ago • 95
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published 14 days ago • 24
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published 11 days ago • 26
STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning Paper • 2502.10177 • Published 23 days ago • 6
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving Paper • 2502.07640 • Published 26 days ago • 8