-
Dynadiff: Single-stage Decoding of Images from Continuously Evolving fMRI
Paper • 2505.14556 • Published • 1 -
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
Paper • 2505.10176 • Published • 3 -
Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex
Paper • 2505.15813 • Published • 3 -
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 22
Peter
Tempo14
AI & ML interests
None yet
Recent Activity
updated
a collection
1 day ago
3D
upvoted
an
article
18 days ago
Understanding Gemma 3n: How MatFormer Gives You Many Models in One
upvoted
an
article
18 days ago
Transformers Are Getting Old: Variants and Alternatives Exist!
Organizations
Encoder
Diffusion
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 121 -
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 72 -
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 93 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 55
self critic
latent reasoning
-
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
Paper • 2502.03275 • Published • 18 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 144 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
RWKV
video
Tools
Attention
-
Selective Attention Improves Transformer
Paper • 2410.02703 • Published • 24 -
Differential Transformer
Paper • 2410.05258 • Published • 180 -
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Paper • 2410.05076 • Published • 8 -
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper • 2410.13276 • Published • 30
Summary
QA
-
Baichuan 2: Open Large-scale Language Models
Paper • 2309.10305 • Published • 20 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding
Paper • 2411.01106 • Published • 4
small models
-
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper • 2310.10837 • Published • 11 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 103 -
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper • 2310.16836 • Published • 14
Code
-
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 73 -
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
Paper • 2311.02303 • Published • 11 -
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 25 -
Magicoder: Source Code Is All You Need
Paper • 2312.02120 • Published • 82
cpu inference
-
Efficient LLM Inference on CPUs
Paper • 2311.00502 • Published • 7 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 14 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 257
Mixture of Experts
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
Mixtral of Experts
Paper • 2401.04088 • Published • 160 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 72 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 55
chain of thought
-
Training Chain-of-Thought via Latent-Variable Inference
Paper • 2312.02179 • Published • 11 -
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 33 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 33 -
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
Paper • 2305.14160 • Published • 1
new architecture
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 25 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 26
RLHF
fast
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 74 -
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 44 -
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Paper • 2405.11582 • Published • 18
efficient inference
-
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 7 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 159
quantization
-
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 51 -
OneBit: Towards Extremely Low-bit Large Language Models
Paper • 2402.11295 • Published • 25 -
A Survey on Transformer Compression
Paper • 2402.05964 • Published • 1 -
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Paper • 2402.08958 • Published • 6
agents
-
More Agents Is All You Need
Paper • 2402.05120 • Published • 56 -
UFO: A UI-Focused Agent for Windows OS Interaction
Paper • 2402.07939 • Published • 17 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 23 -
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Paper • 2407.04363 • Published • 34
mamba
-
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 33 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 17 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 27 -
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper • 2408.12570 • Published • 34
reinforcement learning
-
In deep reinforcement learning, a pruned network is a good network
Paper • 2402.12479 • Published • 19 -
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Paper • 2403.03950 • Published • 16 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 39
Self Improvement
Training
Linear
Math
RAG
-
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 32 -
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries
Paper • 2406.12824 • Published • 21 -
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Paper • 2406.15319 • Published • 65 -
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Paper • 2406.14972 • Published • 7
In-Context
Molecular
Pre-Training
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 95 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Paper • 2410.20650 • Published • 17
Tokenizer
Spaces
Edit Pictures
Music
Interpretability
Transformer
-
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 38 -
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Paper • 2502.15007 • Published • 175 -
Transformers without Normalization
Paper • 2503.10622 • Published • 167 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32
scaling
layer
images
-
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Paper • 2501.18427 • Published • 20 -
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Paper • 2502.20388 • Published • 16 -
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Paper • 2503.09641 • Published • 40 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
Autoregressvie Image Generation
World Model
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 80 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?
Paper • 2503.05333 • Published • 8 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 51
Reasoning
-
On Memorization of Large Language Models in Logical Reasoning
Paper • 2410.23123 • Published • 18 -
LLMs Do Not Think Step-by-step In Implicit Reasoning
Paper • 2411.15862 • Published • 10 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 88 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 33
interesting
-
XGen-7B Technical Report
Paper • 2309.03450 • Published • 8 -
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 44 -
Robotic Table Tennis: A Case Study into a High Speed Learning System
Paper • 2309.03315 • Published • 7 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82
Long Context
-
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 44 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2
hallucination
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 39 -
Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training
Paper • 2410.15460 • Published • 1 -
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations
Paper • 2410.18860 • Published • 11 -
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Paper • 2411.14257 • Published • 13
Traffic
-
Minimalist Traffic Prediction: Linear Layer Is All You Need
Paper • 2308.10276 • Published -
Traffic Light Control with Reinforcement Learning
Paper • 2308.14295 • Published -
Deep Reinforcement Learning for the Joint Control of Traffic Light Signaling and Vehicle Speed Advice
Paper • 2309.09881 • Published • 1 -
STT: Stateful Tracking with Transformers for Autonomous Driving
Paper • 2405.00236 • Published • 9
Fine-Tuning
-
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper • 2310.17752 • Published • 14 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper • 2311.06243 • Published • 22 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30
Prompt Engineering
motion
robotic
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 18 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16
outperform gpt-4
german model
mobile device
-
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Paper • 2401.16158 • Published • 21 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 132 -
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper • 2405.12107 • Published • 30 -
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Paper • 2406.01014 • Published • 35
alignment
-
Direct Language Model Alignment from Online AI Feedback
Paper • 2402.04792 • Published • 33 -
Suppressing Pink Elephants with Direct Principle Feedback
Paper • 2402.07896 • Published • 11 -
Reformatted Alignment
Paper • 2402.12219 • Published • 18 -
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 28
practical
Synthetic Dataset
Instruction Tuning
-
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper • 2312.14187 • Published • 52 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 55 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 27
compress
Inpaint
vision
-
What matters when building vision-language models?
Paper • 2405.02246 • Published • 104 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Paper • 2405.19707 • Published • 7 -
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Paper • 2410.08049 • Published • 8
3D
-
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner
Paper • 2405.14979 • Published • 20 -
PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting
Paper • 2405.19957 • Published • 10 -
GECO: Generative Image-to-3D within a SECOnd
Paper • 2405.20327 • Published • 10 -
gsplat: An Open-Source Library for Gaussian Splatting
Paper • 2409.06765 • Published • 17
Embedding
Stable Diffusion
-
Zero-shot Image Editing with Reference Imitation
Paper • 2406.07547 • Published • 34 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 27 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 26
comparison
Merging
Unlearning
Memory
Multimodal
Yolo
Brain
-
Dynadiff: Single-stage Decoding of Images from Continuously Evolving fMRI
Paper • 2505.14556 • Published • 1 -
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
Paper • 2505.10176 • Published • 3 -
Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex
Paper • 2505.15813 • Published • 3 -
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 22
Interpretability
Encoder
Transformer
-
You Do Not Fully Utilize Transformer's Representation Capacity
Paper • 2502.09245 • Published • 38 -
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers
Paper • 2502.15007 • Published • 175 -
Transformers without Normalization
Paper • 2503.10622 • Published • 167 -
Forgetting Transformer: Softmax Attention with a Forget Gate
Paper • 2503.02130 • Published • 32
Diffusion
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 121 -
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 72 -
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 93 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 55
scaling
self critic
layer
latent reasoning
-
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
Paper • 2502.03275 • Published • 18 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 144 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
images
-
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Paper • 2501.18427 • Published • 20 -
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Paper • 2502.20388 • Published • 16 -
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Paper • 2503.09641 • Published • 40 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
RWKV
Autoregressvie Image Generation
video
World Model
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 80 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 20 -
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?
Paper • 2503.05333 • Published • 8 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 51
Tools
Reasoning
-
On Memorization of Large Language Models in Logical Reasoning
Paper • 2410.23123 • Published • 18 -
LLMs Do Not Think Step-by-step In Implicit Reasoning
Paper • 2411.15862 • Published • 10 -
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 88 -
Deliberation in Latent Space via Differentiable Cache Augmentation
Paper • 2412.17747 • Published • 33
Attention
-
Selective Attention Improves Transformer
Paper • 2410.02703 • Published • 24 -
Differential Transformer
Paper • 2410.05258 • Published • 180 -
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
Paper • 2410.05076 • Published • 8 -
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
Paper • 2410.13276 • Published • 30
interesting
-
XGen-7B Technical Report
Paper • 2309.03450 • Published • 8 -
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 44 -
Robotic Table Tennis: A Case Study into a High Speed Learning System
Paper • 2309.03315 • Published • 7 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82
Summary
Long Context
-
FLM-101B: An Open LLM and How to Train It with $100K Budget
Paper • 2309.03852 • Published • 44 -
Extending LLMs' Context Window with 100 Samples
Paper • 2401.07004 • Published • 16 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 18 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2
QA
-
Baichuan 2: Open Large-scale Language Models
Paper • 2309.10305 • Published • 20 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding
Paper • 2411.01106 • Published • 4
hallucination
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 39 -
Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training
Paper • 2410.15460 • Published • 1 -
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations
Paper • 2410.18860 • Published • 11 -
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Paper • 2411.14257 • Published • 13
small models
-
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper • 2310.10837 • Published • 11 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 103 -
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models
Paper • 2310.16795 • Published • 27 -
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Paper • 2310.16836 • Published • 14
Traffic
-
Minimalist Traffic Prediction: Linear Layer Is All You Need
Paper • 2308.10276 • Published -
Traffic Light Control with Reinforcement Learning
Paper • 2308.14295 • Published -
Deep Reinforcement Learning for the Joint Control of Traffic Light Signaling and Vehicle Speed Advice
Paper • 2309.09881 • Published • 1 -
STT: Stateful Tracking with Transformers for Autonomous Driving
Paper • 2405.00236 • Published • 9
Code
-
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper • 2310.17680 • Published • 73 -
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning
Paper • 2311.02303 • Published • 11 -
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code
Paper • 2311.07989 • Published • 25 -
Magicoder: Source Code Is All You Need
Paper • 2312.02120 • Published • 82
Fine-Tuning
-
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Paper • 2310.17752 • Published • 14 -
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 32 -
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper • 2311.06243 • Published • 22 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30
cpu inference
-
Efficient LLM Inference on CPUs
Paper • 2311.00502 • Published • 7 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 14 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 257
Prompt Engineering
Mixture of Experts
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 18 -
Mixtral of Experts
Paper • 2401.04088 • Published • 160 -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
Paper • 2401.04081 • Published • 72 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 55
motion
chain of thought
-
Training Chain-of-Thought via Latent-Variable Inference
Paper • 2312.02179 • Published • 11 -
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Paper • 2312.04474 • Published • 33 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 33 -
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
Paper • 2305.14160 • Published • 1
robotic
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 18 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16
new architecture
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 52 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 25 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 26
outperform gpt-4
RLHF
german model
fast
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 74 -
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Paper • 2402.11131 • Published • 44 -
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Paper • 2405.11582 • Published • 18
mobile device
-
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Paper • 2401.16158 • Published • 21 -
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Paper • 2402.14905 • Published • 132 -
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper • 2405.12107 • Published • 30 -
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
Paper • 2406.01014 • Published • 35
efficient inference
-
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting
Paper • 2402.13720 • Published • 7 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 34 -
Your Transformer is Secretly Linear
Paper • 2405.12250 • Published • 159
alignment
-
Direct Language Model Alignment from Online AI Feedback
Paper • 2402.04792 • Published • 33 -
Suppressing Pink Elephants with Direct Principle Feedback
Paper • 2402.07896 • Published • 11 -
Reformatted Alignment
Paper • 2402.12219 • Published • 18 -
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 28
quantization
-
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 51 -
OneBit: Towards Extremely Low-bit Large Language Models
Paper • 2402.11295 • Published • 25 -
A Survey on Transformer Compression
Paper • 2402.05964 • Published • 1 -
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Paper • 2402.08958 • Published • 6
practical
agents
-
More Agents Is All You Need
Paper • 2402.05120 • Published • 56 -
UFO: A UI-Focused Agent for Windows OS Interaction
Paper • 2402.07939 • Published • 17 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 23 -
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Paper • 2407.04363 • Published • 34
Synthetic Dataset
mamba
-
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 33 -
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Paper • 2401.17574 • Published • 17 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 27 -
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper • 2408.12570 • Published • 34
Instruction Tuning
-
A Survey on Data Selection for LLM Instruction Tuning
Paper • 2402.05123 • Published • 3 -
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation
Paper • 2312.14187 • Published • 52 -
Generative Representational Instruction Tuning
Paper • 2402.09906 • Published • 55 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 27
reinforcement learning
-
In deep reinforcement learning, a pruned network is a good network
Paper • 2402.12479 • Published • 19 -
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Paper • 2403.03950 • Published • 16 -
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 72 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 39
compress
Self Improvement
Inpaint
Training
vision
-
What matters when building vision-language models?
Paper • 2405.02246 • Published • 104 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Paper • 2405.19707 • Published • 7 -
Scaling Up Your Kernels: Large Kernel Design in ConvNets towards Universal Representations
Paper • 2410.08049 • Published • 8
Linear
3D
-
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner
Paper • 2405.14979 • Published • 20 -
PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting
Paper • 2405.19957 • Published • 10 -
GECO: Generative Image-to-3D within a SECOnd
Paper • 2405.20327 • Published • 10 -
gsplat: An Open-Source Library for Gaussian Splatting
Paper • 2409.06765 • Published • 17
Math
Embedding
RAG
-
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Paper • 2405.19893 • Published • 32 -
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries
Paper • 2406.12824 • Published • 21 -
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Paper • 2406.15319 • Published • 65 -
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems
Paper • 2406.14972 • Published • 7
Stable Diffusion
-
Zero-shot Image Editing with Reference Imitation
Paper • 2406.07547 • Published • 34 -
Scalable Autoregressive Image Generation with Mamba
Paper • 2408.12245 • Published • 27 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 32 -
Training-free Regional Prompting for Diffusion Transformers
Paper • 2411.02395 • Published • 26
In-Context
comparison
Molecular
Merging
Pre-Training
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 95 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
Paper • 2410.20650 • Published • 17
Unlearning
Tokenizer
Memory
Spaces
Multimodal
Edit Pictures
Yolo
Music