-
22
Seed X
💻A powerful multilingual translation language model
-
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters
Paper • 2507.13618 • Published • 16 -
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 78 -
ByteDance-Seed/Seed-X-PPO-7B
Translation • Updated • 18.5k • 244
Collections
Discover the best community collections!
Collections including paper arxiv:2508.14460
-
Seed-Coder: Let the Code Model Curate Data for Itself
Paper • 2506.03524 • Published • 6 -
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
Paper • 2504.13914 • Published • 4 -
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Paper • 2503.10772 • Published • 19 -
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Paper • 2503.09949 • Published • 5
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 63 -
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Paper • 2508.02091 • Published • 13 -
DINOv3
Paper • 2508.10104 • Published • 218 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 88
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 406 • 95 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 62 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 47 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 51
-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 78 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published
-
Scaling Test-time Compute for LLM Agents
Paper • 2506.12928 • Published • 61 -
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Paper • 2507.08616 • Published • 13 -
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge
Paper • 2507.21990 • Published • 24 -
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 78
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5
-
22
Seed X
💻A powerful multilingual translation language model
-
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters
Paper • 2507.13618 • Published • 16 -
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 78 -
ByteDance-Seed/Seed-X-PPO-7B
Translation • Updated • 18.5k • 244
-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 78 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published
-
Seed-Coder: Let the Code Model Curate Data for Itself
Paper • 2506.03524 • Published • 6 -
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning
Paper • 2504.13914 • Published • 4 -
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Paper • 2503.10772 • Published • 19 -
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Paper • 2503.09949 • Published • 5
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 63 -
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Paper • 2508.02091 • Published • 13 -
DINOv3
Paper • 2508.10104 • Published • 218 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 88
-
Scaling Test-time Compute for LLM Agents
Paper • 2506.12928 • Published • 61 -
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Paper • 2507.08616 • Published • 13 -
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge
Paper • 2507.21990 • Published • 24 -
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 78
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 406 • 95 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 35 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 89
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 62 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 153 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 47 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 51
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 5