ByteDance Papers

Presidentlin 's Collections

ByteDance Papers

Deepseek Papers

AI Release Year Thread 2025

AI Release Week Thread (18 August 2025)

AI Release Week Thread (11 August 2025)

AI Release Week Thread (04 August 2025)

AI Release Week Thread (28 July 2025)

AI Release Week Thread (21 July 2025)

AI Release Week Thread (14 July 2025)

AI Release Week Thread (7 July 2025)

AI Release Week Thread (30 June 2025)

AI Release Week Thread (23 June 2025)

AI Release Week Thread (16 June 2025)

AI Release Week Thread (2 June 2025)

AI Release Week Thread (26 May 2025)

AI Release Week Thread (19 May 2025)

AI Release Week Thread (12 May 2025)

updated 35 minutes ago

ByteDance papers collection

Upvote

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises

Paper • 2302.10025 • Published Feb 20, 2023
Efficient Neural Music Generation

Paper • 2305.15719 • Published May 25, 2023 • 2
PolyVoice: Language Models for Speech to Speech Translation

Paper • 2306.02982 • Published Jun 5, 2023 • 4
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Paper • 2308.05734 • Published Aug 10, 2023 • 37
MagicEdit: High-Fidelity and Temporally Coherent Video Editing

Paper • 2308.14749 • Published Aug 28, 2023 • 1
SALMONN: Towards Generic Hearing Abilities for Large Language Models

Paper • 2310.13289 • Published Oct 20, 2023 • 17
Make Pixels Dance: High-Dynamic Video Generation

Paper • 2311.10982 • Published Nov 18, 2023 • 69
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

Paper • 2311.16498 • Published Nov 27, 2023 • 1
PixelLM: Pixel Reasoning with Large Multimodal Model

Paper • 2312.02228 • Published Dec 4, 2023
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

Paper • 2312.08870 • Published Dec 12, 2023 • 1
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

Paper • 2312.10300 • Published Dec 16, 2023 • 1
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 62
Magic-Me: Identity-Specific Video Customized Diffusion

Paper • 2402.09368 • Published Feb 14, 2024 • 31
SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Paper • 2402.13929 • Published Feb 21, 2024 • 28
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23, 2024 • 39
You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs

Paper • 2403.12931 • Published Mar 19, 2024 • 1
Magic-Boost: Boost 3D Generation with Mutli-View Conditioned Diffusion

Paper • 2404.06429 • Published Apr 9, 2024 • 7
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15, 2024 • 13
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21, 2024 • 29
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Paper • 2404.16994 • Published Apr 25, 2024 • 37
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2, 2024 • 57
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

Paper • 2405.07510 • Published May 13, 2024 • 2
Unveiling the Tapestry of Consistency in Large Vision-Language Models

Paper • 2405.14156 • Published May 23, 2024
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

Paper • 2405.17532 • Published May 27, 2024 • 1
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Paper • 2405.18424 • Published May 28, 2024 • 9
Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

Paper • 2405.17871 • Published May 28, 2024 • 1
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4, 2024 • 39
Towards Semantic Equivalence of Tokenization in Multimodal LLM

Paper • 2406.05127 • Published Jun 7, 2024
An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11, 2024 • 60
Autoregressive Pretraining with Mamba in Vision

Paper • 2406.07537 • Published Jun 11, 2024
Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 104
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

Paper • 2406.13340 • Published Jun 19, 2024
Let the Code LLM Edit Itself When You Edit the Code

Paper • 2407.03157 • Published Jul 3, 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

Paper • 2407.04675 • Published Jul 5, 2024
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Paper • 2407.07577 • Published Jul 10, 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Paper • 2407.07895 • Published Jul 10, 2024 • 43
ByteCheckpoint: A Unified Checkpointing System for Large Foundation Model Development

Paper • 2407.20143 • Published Jul 29, 2024
An X-ray Significantly Variable, Luminous, Type 2 Quasar at z = 2.99 with a Massive Host Galaxy

Paper • 2409.01960 • Published Sep 3, 2024 • 1
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published Sep 13, 2024 • 54
HybridFlow: A Flexible and Efficient RLHF Framework

Paper • 2409.19256 • Published Sep 28, 2024
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering

Paper • 2409.16167 • Published Sep 24, 2024
MaskBit: Embedding-free Image Generation via Bit Tokens

Paper • 2409.16211 • Published Sep 24, 2024 • 17
Hyper-Connections

Paper • 2409.19606 • Published Sep 29, 2024 • 23
Video Instruction Tuning With Synthetic Data

Paper • 2410.02713 • Published Oct 3, 2024 • 40
Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Paper • 2410.02757 • Published Oct 3, 2024 • 37
FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 28
KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks

Paper • 2410.06526 • Published Oct 9, 2024 • 1
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Paper • 2410.08067 • Published Oct 10, 2024 • 2
Why Does the Effective Context Length of LLMs Fall Short?

Paper • 2410.18745 • Published Oct 24, 2024 • 18
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Paper • 2410.20424 • Published Oct 27, 2024 • 41
How Far is Video Generation from World Model: A Physical Law Perspective

Paper • 2411.02385 • Published Nov 4, 2024 • 35
Classification Done Right for Vision-Language Pre-Training

Paper • 2411.03313 • Published Nov 5, 2024
Multi-Reward as Condition for Instruction-based Image Editing

Paper • 2411.04713 • Published Nov 6, 2024 • 1
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Paper • 2411.03884 • Published Nov 6, 2024 • 29
SeedEdit: Align Image Re-Generation to Image Editing

Paper • 2411.06686 • Published Nov 11, 2024 • 1
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing

Paper • 2411.08446 • Published Nov 13, 2024
Understanding Chain-of-Thought in LLMs through Information Theory

Paper • 2411.11984 • Published Nov 18, 2024 • 3
Ultra-Sparse Memory Network

Paper • 2411.12364 • Published Nov 19, 2024 • 24
DSTC: Direct Preference Learning with Only Self-Generated Tests and Code to Improve Code LMs

Paper • 2411.13611 • Published Nov 20, 2024
FullStack Bench: Evaluating LLMs as Full Stack Coder

Paper • 2412.00535 • Published Nov 30, 2024
The Rise and Down of Babel Tower: Investigating the Evolution Process of Multilingual Code Large Language Model

Paper • 2412.07298 • Published Dec 10, 2024
Diffusion Adversarial Post-Training for One-Step Video Generation

Paper • 2501.08316 • Published Jan 14 • 36
VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

Paper • 2501.09781 • Published Jan 16 • 29
UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper • 2501.12326 • Published Jan 21 • 64
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

Paper • 2502.03438 • Published Feb 5 • 2
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion

Paper • 2502.04235 • Published Feb 6 • 22
MagicArticulate: Make Your 3D Models Articulation-Ready

Paper • 2502.12135 • Published Feb 17 • 8
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts

Paper • 2502.19811 • Published Feb 27
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 106
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Paper • 2502.20766 • Published Feb 28 • 1
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

Paper • 2503.07703 • Published Mar 10 • 36
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning

Paper • 2503.07906 • Published Mar 10 • 4
FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis

Paper • 2503.13265 • Published Mar 17 • 15
DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 137
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?

Paper • 2504.00509 • Published Apr 1 • 23
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Paper • 2504.02605 • Published Apr 3 • 48
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

Paper • 2504.13914 • Published Apr 10 • 4
Seedream 3.0 Technical Report

Paper • 2504.11346 • Published Apr 15 • 67
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 61
Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11 • 150
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection

Paper • 2505.07293 • Published May 12 • 27
DanceGRPO: Unleashing GRPO on Visual Generation

Paper • 2505.07818 • Published May 12 • 31
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

Paper • 2505.11432 • Published May 16 • 1
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning

Paper • 2505.11896 • Published May 17 • 58
Model Merging in Pre-training of Large Language Models

Paper • 2505.12082 • Published May 17 • 39
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 133
Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76
Scaling Diffusion Transformers Efficiently via μP

Paper • 2505.15270 • Published May 21 • 34
MMaDA: Multimodal Large Diffusion Language Models

Paper • 2505.15809 • Published May 21 • 95
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Paper • 2505.19914 • Published May 26 • 44
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

Paper • 2505.21473 • Published May 27 • 16
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

Paper • 2506.03136 • Published Jun 3 • 24
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Paper • 2506.05573 • Published Jun 5 • 78
Seedance 1.0: Exploring the Boundaries of Video Generation Models

Paper • 2506.09113 • Published Jun 10 • 102
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

Paper • 2506.15211 • Published Jun 18 • 36
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Paper • 2506.18896 • Published Jun 23 • 28
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23 • 33
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 41
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Paper • 2507.23726 • Published 25 days ago • 108
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published 21 days ago • 129
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published 11 days ago • 25
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published 5 days ago • 76

Upvote

Collection guide
Browse collections