Contrastive Learning for Many-to-many Multilingual Neural Machine Translation Paper • 2105.09501 • Published May 20, 2021
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning Paper • 2212.10240 • Published Dec 20, 2022 • 1
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises Paper • 2302.10025 • Published Feb 20, 2023
PolyVoice: Language Models for Speech to Speech Translation Paper • 2306.02982 • Published Jun 5, 2023 • 4
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining Paper • 2308.05734 • Published Aug 10, 2023 • 37
MagicEdit: High-Fidelity and Temporally Coherent Video Editing Paper • 2308.14749 • Published Aug 28, 2023 • 1
SALMONN: Towards Generic Hearing Abilities for Large Language Models Paper • 2310.13289 • Published Oct 20, 2023 • 17
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model Paper • 2311.16498 • Published Nov 27, 2023 • 1
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens Paper • 2312.08870 • Published Dec 12, 2023 • 1
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos Paper • 2312.10300 • Published Dec 16, 2023 • 1
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19, 2024 • 62
Magic-Me: Identity-Specific Video Customized Diffusion Paper • 2402.09368 • Published Feb 14, 2024 • 30
SDXL-Lightning: Progressive Adversarial Diffusion Distillation Paper • 2402.13929 • Published Feb 21, 2024 • 28
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23, 2024 • 39
You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs Paper • 2403.12931 • Published Mar 19, 2024 • 1
Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion Paper • 2404.06429 • Published Apr 9, 2024 • 7
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Paper • 2404.09990 • Published Apr 15, 2024 • 13
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21, 2024 • 29
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Paper • 2404.16994 • Published Apr 25, 2024 • 37
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2, 2024 • 57
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator Paper • 2405.07510 • Published May 13, 2024 • 2
Unveiling the Tapestry of Consistency in Large Vision-Language Models Paper • 2405.14156 • Published May 23, 2024
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance Paper • 2405.17532 • Published May 27, 2024 • 1
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting Paper • 2405.18424 • Published May 28, 2024 • 9
Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment Paper • 2405.17871 • Published May 28, 2024 • 1
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4, 2024 • 37
Towards Semantic Equivalence of Tokenization in Multimodal LLM Paper • 2406.05127 • Published Jun 7, 2024
An Image is Worth 32 Tokens for Reconstruction and Generation Paper • 2406.07550 • Published Jun 11, 2024 • 60
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words Paper • 2406.13340 • Published Jun 19, 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition Paper • 2407.04675 • Published Jul 5, 2024
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model Paper • 2407.07577 • Published Jul 10, 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10, 2024 • 43
ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development Paper • 2407.20143 • Published Jul 29, 2024
An X-ray Significantly Variable, Luminous, Type 2 Quasar at z = 2.99 with a Massive Host Galaxy Paper • 2409.01960 • Published Sep 3, 2024 • 1
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13, 2024 • 55
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering Paper • 2409.16167 • Published Sep 24, 2024
MaskBit: Embedding-free Image Generation via Bit Tokens Paper • 2409.16211 • Published Sep 24, 2024 • 17
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3, 2024 • 38
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks Paper • 2410.06526 • Published Oct 9, 2024 • 1
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Paper • 2410.08067 • Published Oct 10, 2024 • 2
Why Does the Effective Context Length of LLMs Fall Short? Paper • 2410.18745 • Published Oct 24, 2024 • 18
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Paper • 2410.20424 • Published Oct 27, 2024 • 41
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4, 2024 • 36
Classification Done Right for Vision-Language Pre-Training Paper • 2411.03313 • Published Nov 5, 2024
Multi-Reward as Condition for Instruction-based Image Editing Paper • 2411.04713 • Published Nov 6, 2024 • 1
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models Paper • 2411.03884 • Published Nov 6, 2024 • 29
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing Paper • 2411.08446 • Published Nov 13, 2024
Understanding Chain-of-Thought in LLMs through Information Theory Paper • 2411.11984 • Published Nov 18, 2024 • 3
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs Paper • 2411.13611 • Published Nov 20, 2024
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model Paper • 2412.07298 • Published Dec 10, 2024
Diffusion Adversarial Post-Training for One-Step Video Generation Paper • 2501.08316 • Published Jan 14 • 34
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos Paper • 2501.09781 • Published Jan 16 • 29
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published Jan 21 • 59
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving Paper • 2502.03438 • Published Feb 5 • 2
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion Paper • 2502.04235 • Published Feb 6 • 22
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts Paper • 2502.19811 • Published Feb 27
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 103
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference Paper • 2502.20766 • Published Feb 28
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model Paper • 2503.07703 • Published Mar 10 • 36
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning Paper • 2503.07906 • Published Mar 10 • 4
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis Paper • 2503.13265 • Published Mar 17 • 15
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 128
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? Paper • 2504.00509 • Published Apr 1 • 21
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Paper • 2504.02605 • Published Apr 3 • 47
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Paper • 2504.13914 • Published Apr 10
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published Apr 15 • 60
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection Paper • 2505.07293 • Published 10 days ago • 25
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production Paper • 2505.11432 • Published 6 days ago • 1
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published 5 days ago • 50
Model Merging in Pre-training of Large Language Models Paper • 2505.12082 • Published 5 days ago • 31
MMaDA: Multimodal Large Diffusion Language Models Paper • 2505.15809 • Published about 15 hours ago • 34