Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14, 2024 • 55
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression Paper • 2403.12968 • Published Mar 19, 2024 • 25
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 69
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14, 2024 • 76
Simple and Scalable Strategies to Continually Pre-train Large Language Models Paper • 2403.08763 • Published Mar 13, 2024 • 50
Language models scale reliably with over-training and on downstream tasks Paper • 2403.08540 • Published Mar 13, 2024 • 15
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8, 2024 • 63
TnT-LLM: Text Mining at Scale with Large Language Models Paper • 2403.12173 • Published Mar 18, 2024 • 20
Larimar: Large Language Models with Episodic Memory Control Paper • 2403.11901 • Published Mar 18, 2024 • 33
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21, 2024 • 52
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions Paper • 2403.15246 • Published Mar 22, 2024 • 10
The Unreasonable Ineffectiveness of the Deeper Layers Paper • 2403.17887 • Published Mar 26, 2024 • 79
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text Paper • 2403.18421 • Published Mar 27, 2024 • 23
Long-context LLMs Struggle with Long In-context Learning Paper • 2404.02060 • Published Apr 2, 2024 • 36
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models Paper • 2404.03543 • Published Apr 4, 2024 • 16
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Paper • 2404.02575 • Published Apr 3, 2024 • 49
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18, 2024 • 55
Flamingo: a Visual Language Model for Few-Shot Learning Paper • 2204.14198 • Published Apr 29, 2022 • 15
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Paper • 2406.04271 • Published Jun 6, 2024 • 29
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning Paper • 2406.06469 • Published Jun 10, 2024 • 25
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published Jun 27, 2024 • 62
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1, 2024 • 86
Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages Paper • 2407.03321 • Published Jul 3, 2024 • 16
Training Task Experts through Retrieval Based Distillation Paper • 2407.05463 • Published Jul 7, 2024 • 8
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct Paper • 2407.05700 • Published Jul 8, 2024 • 12
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? Paper • 2407.15711 • Published Jul 22, 2024 • 9
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29, 2024 • 42
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models Paper • 2408.06663 • Published Aug 13, 2024 • 16
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20, 2024 • 42
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 125
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23, 2024 • 26
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
A Controlled Study on Long Context Extension and Generalization in LLMs Paper • 2409.12181 • Published Sep 18, 2024 • 44
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 37
Attention Prompting on Image for Large Vision-Language Models Paper • 2409.17143 • Published Sep 25, 2024 • 7
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published Sep 30, 2024 • 54
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond Paper • 2411.03590 • Published Nov 6, 2024 • 10
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 36
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Paper • 2501.05452 • Published Jan 9 • 15
Demystifying Domain-adaptive Post-training for Financial LLMs Paper • 2501.04961 • Published Jan 9 • 11
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper • 2412.19723 • Published Dec 27, 2024 • 82
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7 • 25
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published Dec 31, 2024 • 25
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong Paper • 2501.09775 • Published Jan 16 • 29
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published Jan 16 • 25
CodeMonkeys: Scaling Test-Time Compute for Software Engineering Paper • 2501.14723 • Published 28 days ago • 8
Can LLMs Maintain Fundamental Abilities under KV Cache Compression? Paper • 2502.01941 • Published 18 days ago • 13
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles Paper • 2502.01081 • Published 19 days ago • 14
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published 15 days ago • 29
Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models Paper • 2502.06755 • Published 11 days ago • 7
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon Paper • 2502.07445 • Published 11 days ago • 11
NoLiMa: Long-Context Evaluation Beyond Literal Matching Paper • 2502.05167 • Published 14 days ago • 15
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models Paper • 2502.14802 • Published 1 day ago • 6
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering Paper • 2502.13962 • Published 2 days ago • 25
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Paper • 2502.12215 • Published 5 days ago • 12
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training Paper • 2502.11196 • Published 5 days ago • 20
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation Paper • 2502.08826 • Published 9 days ago • 13
Intuitive physics understanding emerges from self-supervised pretraining on natural videos Paper • 2502.11831 • Published 4 days ago • 12
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking Paper • 2502.09083 • Published 9 days ago • 4
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published 1 day ago • 122
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 10 days ago • 52