ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion Paper • 2403.18818 • Published Mar 27, 2024 • 26
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published 8 days ago • 265
view article Article PEFT: Parameter-Efficient Fine-Tuning Methods for LLMs By samuellimabraz • 6 days ago • 9
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 77
MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published Oct 15, 2024 • 21
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 16 days ago • 270
TinyHelen's First Curriculum: Training and Evaluating Tiny Language Models in a Simpler Language Environment Paper • 2501.00522 • Published 30 days ago • 1
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 27 days ago • 31
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 27 days ago • 42
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Paper • 2306.07691 • Published Jun 13, 2023 • 6
Scaling Test-Time Compute with Open Models Collection Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated 24 days ago • 22
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published Dec 23, 2024 • 30
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? Paper • 2307.14023 • Published Jul 26, 2023 • 1