-
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Paper • 2505.14604 • Published • 23 -
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Paper • 2505.16944 • Published • 8 -
Training Step-Level Reasoning Verifiers with Formal Verification Tools
Paper • 2505.15960 • Published • 7 -
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Paper • 2505.15134 • Published • 6
Felix Tuma
floom
AI & ML interests
NLP
Recent Activity
upvoted
a
paper
3 days ago
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
updated
a collection
3 days ago
ShowAndTell
updated
a collection
3 days ago
PotentialApplication
Organizations
None yet
ShowAndTell
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 104 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 42 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
Coding
-
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper • 2401.16467 • Published • 10 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 147 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 11
ICL
Model Training
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 13 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 56 -
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
Paper • 2401.05605 • Published -
Aligning Large Language Models with Counterfactual DPO
Paper • 2401.09566 • Published • 2
NLU
-
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Paper • 2402.05140 • Published • 24 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 8 -
Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task
Paper • 2310.06504 • Published • 1 -
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Paper • 2402.10466 • Published • 19
RAG
-
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 55 -
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Paper • 2406.15319 • Published • 65 -
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Paper • 2407.12883 • Published • 10 -
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval
Paper • 2407.19669 • Published • 24
Long-context
-
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory
Paper • 2402.04617 • Published • 4 -
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Paper • 2403.09347 • Published • 23 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 25 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 25
sentence-transformer-models
Feedback Analysis
Webscraping
Evaluation
-
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Paper • 2403.04132 • Published • 41 -
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 20 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13
SSM
Efficient Serving/Inference
Hallucination
ShowAndTell-2025-01-30
-
Atla Selene Mini: A General Purpose Evaluation Model
Paper • 2501.17195 • Published • 36 -
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 65 -
Optimizing Large Language Model Training Using FP4 Quantization
Paper • 2501.17116 • Published • 38 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 123
ShowAndTell-2024-12-03
-
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS
Paper • 2411.18478 • Published • 38 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 45 -
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models
Paper • 2411.19477 • Published • 6 -
Reverse Thinking Makes LLMs Stronger Reasoners
Paper • 2411.19865 • Published • 23
Reasoning
-
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 116 -
Customizing Language Model Responses with Contrastive In-Context Learning
Paper • 2401.17390 • Published -
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
Paper • 2402.06332 • Published • 20 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109
RL
-
Diffusion World Model
Paper • 2402.03570 • Published • 8 -
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Paper • 2401.16335 • Published • 1 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper • 2402.07319 • Published • 14
Agents
-
An Interactive Agent Foundation Model
Paper • 2402.05929 • Published • 30 -
From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
Paper • 2401.02777 • Published • 1 -
AgentScope: A Flexible yet Robust Multi-Agent Platform
Paper • 2402.14034 • Published • 14 -
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Paper • 2403.04746 • Published • 26
Training data
-
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts
Paper • 2402.07625 • Published • 15 -
Rethinking Data Selection for Supervised Fine-Tuning
Paper • 2402.06094 • Published • 1 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 49 -
TnT-LLM: Text Mining at Scale with Large Language Models
Paper • 2403.12173 • Published • 21
Data Efficient Approaches
-
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 43 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 28 -
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Paper • 2403.03194 • Published • 15 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 26
Personalization
-
User-LLM: Efficient LLM Contextualization with User Embeddings
Paper • 2402.13598 • Published • 20 -
Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks
Paper • 2403.05185 • Published • 26 -
SPAR: Personalized Content-Based Recommendation via Long Engagement Attention
Paper • 2402.10555 • Published • 36
Tool Use & more
-
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Paper • 2403.04746 • Published • 26 -
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs
Paper • 2402.15491 • Published • 16 -
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Paper • 2402.10466 • Published • 19
Model Safety
Timeseries
Memory
TabularData
Synthetic Data Generation
Frontier research ideas
PotentialApplication
-
Let LLMs Break Free from Overthinking via Self-Braking Tuning
Paper • 2505.14604 • Published • 23 -
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios
Paper • 2505.16944 • Published • 8 -
Training Step-Level Reasoning Verifiers with Formal Verification Tools
Paper • 2505.15960 • Published • 7 -
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Paper • 2505.15134 • Published • 6
ShowAndTell-2025-01-30
-
Atla Selene Mini: A General Purpose Evaluation Model
Paper • 2501.17195 • Published • 36 -
DeepSeek-V3 Technical Report
Paper • 2412.19437 • Published • 65 -
Optimizing Large Language Model Training Using FP4 Quantization
Paper • 2501.17116 • Published • 38 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 123
ShowAndTell
-
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Paper • 2412.11605 • Published • 18 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 104 -
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
Paper • 2412.17739 • Published • 42 -
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval
Paper • 2412.15443 • Published • 10
ShowAndTell-2024-12-03
-
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS
Paper • 2411.18478 • Published • 38 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 45 -
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models
Paper • 2411.19477 • Published • 6 -
Reverse Thinking Makes LLMs Stronger Reasoners
Paper • 2411.19865 • Published • 23
Coding
-
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Paper • 2401.16467 • Published • 10 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 147 -
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
Paper • 2402.14658 • Published • 84 -
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming
Paper • 2402.14261 • Published • 11
Reasoning
-
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 116 -
Customizing Language Model Responses with Contrastive In-Context Learning
Paper • 2401.17390 • Published -
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
Paper • 2402.06332 • Published • 20 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109
ICL
RL
-
Diffusion World Model
Paper • 2402.03570 • Published • 8 -
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Paper • 2401.16335 • Published • 1 -
Towards Efficient and Exact Optimization of Language Model Alignment
Paper • 2402.00856 • Published -
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Paper • 2402.07319 • Published • 14
Model Training
-
Rethinking Optimization and Architecture for Tiny Language Models
Paper • 2402.02791 • Published • 13 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 56 -
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
Paper • 2401.05605 • Published -
Aligning Large Language Models with Counterfactual DPO
Paper • 2401.09566 • Published • 2
Agents
-
An Interactive Agent Foundation Model
Paper • 2402.05929 • Published • 30 -
From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
Paper • 2401.02777 • Published • 1 -
AgentScope: A Flexible yet Robust Multi-Agent Platform
Paper • 2402.14034 • Published • 14 -
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Paper • 2403.04746 • Published • 26
NLU
-
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Paper • 2402.05140 • Published • 24 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 8 -
Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task
Paper • 2310.06504 • Published • 1 -
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Paper • 2402.10466 • Published • 19
Training data
-
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts
Paper • 2402.07625 • Published • 15 -
Rethinking Data Selection for Supervised Fine-Tuning
Paper • 2402.06094 • Published • 1 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 49 -
TnT-LLM: Text Mining at Scale with Large Language Models
Paper • 2403.12173 • Published • 21
RAG
-
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 55 -
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Paper • 2406.15319 • Published • 65 -
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Paper • 2407.12883 • Published • 10 -
mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval
Paper • 2407.19669 • Published • 24
Data Efficient Approaches
-
How to Train Data-Efficient LLMs
Paper • 2402.09668 • Published • 43 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 28 -
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Paper • 2403.03194 • Published • 15 -
Orca-Math: Unlocking the potential of SLMs in Grade School Math
Paper • 2402.14830 • Published • 26
Long-context
-
InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory
Paper • 2402.04617 • Published • 4 -
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences
Paper • 2403.09347 • Published • 23 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 25 -
Training-Free Long-Context Scaling of Large Language Models
Paper • 2402.17463 • Published • 25
Personalization
-
User-LLM: Efficient LLM Contextualization with User Embeddings
Paper • 2402.13598 • Published • 20 -
Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks
Paper • 2403.05185 • Published • 26 -
SPAR: Personalized Content-Based Recommendation via Long Engagement Attention
Paper • 2402.10555 • Published • 36
sentence-transformer-models
Tool Use & more
-
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error
Paper • 2403.04746 • Published • 26 -
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs
Paper • 2402.15491 • Published • 16 -
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling
Paper • 2402.10466 • Published • 19
Feedback Analysis
Model Safety
Webscraping
Timeseries
Evaluation
-
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Paper • 2403.04132 • Published • 41 -
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 20 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 22 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13
Memory
SSM
TabularData
Efficient Serving/Inference
Synthetic Data Generation
Hallucination
Frontier research ideas