Nifty
updated
LLMs + Persona-Plug = Personalized LLMs
Paper
• 2409.11901
• Published
• 35
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic
reasoning
Paper
• 2409.12183
• Published
• 39
Chain of Thought Empowers Transformers to Solve Inherently Serial
Problems
Paper
• 2402.12875
• Published
• 13
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices
Paper
• 2410.00531
• Published
• 33
DiaSynth -- Synthetic Dialogue Generation Framework
Paper
• 2409.19020
• Published
• 20
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Paper
• 2410.12405
• Published
• 13
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A
Gradient Perspective
Paper
• 2410.23743
• Published
• 64
SelfCodeAlign: Self-Alignment for Code Generation
Paper
• 2410.24198
• Published
• 24
BitStack: Fine-Grained Size Control for Compressed Large Language Models
in Variable Memory Environments
Paper
• 2410.23918
• Published
• 21
ATM: Improving Model Merging by Alternating Tuning and Merging
Paper
• 2411.03055
• Published
• 1
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
• 2411.04905
• Published
• 127
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large
Language Model
Paper
• 2411.04496
• Published
• 22
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
• 2411.04282
• Published
• 37
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for
Quantized LLMs with 100T Training Tokens
Paper
• 2411.17691
• Published
• 13
Qwen2.5-Coder Technical Report
Paper
• 2409.12186
• Published
• 153
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper
• 2412.01928
• Published
• 45
Wonderful Matrices: Combining for a More Efficient and Effective
Foundation Model Architecture
Paper
• 2412.11834
• Published
• 8
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper
• 2412.16145
• Published
• 38
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Paper
• 2412.14711
• Published
• 16
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
• 2412.14922
• Published
• 88
Analyze Feature Flow to Enhance Interpretation and Steering in Language
Models
Paper
• 2502.03032
• Published
• 60
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published
• 152
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on
a Single GPU
Paper
• 2502.08910
• Published
• 148
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
Paper
• 2502.09601
• Published
• 14
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced
Chain-of-Thought in Large Language Models
Paper
• 2502.09390
• Published
• 16
Dyve: Thinking Fast and Slow for Dynamic Process Verification
Paper
• 2502.11157
• Published
• 7
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o
Under Data Scarsity
Paper
• 2502.11901
• Published
• 6
FoNE: Precise Single-Token Number Embeddings via Fourier Features
Paper
• 2502.09741
• Published
• 15
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and
Mixture-of-Experts Optimization Alignment
Paper
• 2502.16894
• Published
• 32
MeshPad: Interactive Sketch Conditioned Artistic-designed Mesh
Generation and Editing
Paper
• 2503.01425
• Published
• 14
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
• 2503.18878
• Published
• 119
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior
Accuracy Preservation
Paper
• 2503.19950
• Published
• 12
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
Taming the Titans: A Survey of Efficient LLM Inference Serving
Paper
• 2504.19720
• Published
• 12
RM-R1: Reward Modeling as Reasoning
Paper
• 2505.02387
• Published
• 81
ReplaceMe: Network Simplification via Layer Pruning and Linear
Transformations
Paper
• 2505.02819
• Published
• 26
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta
Correction
Paper
• 2505.11254
• Published
• 48
Diffusion vs. Autoregressive Language Models: A Text Embedding
Perspective
Paper
• 2505.15045
• Published
• 55
Shifting AI Efficiency From Model-Centric to Data-Centric Compression
Paper
• 2505.19147
• Published
• 145
Scaling Laws for Optimal Data Mixtures
Paper
• 2507.09404
• Published
• 37
FLEXITOKENS: Flexible Tokenization for Evolving Language Models
Paper
• 2507.12720
• Published
• 10
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger
Paper
• 2602.08222
• Published
• 268