Longxu Dou's picture

Longxu Dou

dreamerdeo

·

https://longxudou.github.io/

AI & ML interests

Natural Language Processing

Recent Activity

upvoted a paper about 2 months ago

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

updated a Space about 2 months ago

sailor2/README

updated a Space about 2 months ago

sailor2/README

View all activity

Organizations

upvoted a paper about 2 months ago

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Paper • 2506.20512 • Published Jun 25 • 46

upvoted 5 papers 3 months ago

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26

Fostering Video Reasoning via Next-Event Prediction

Paper • 2505.22457 • Published May 28 • 29

Lifelong Safety Alignment for Language Models

Paper • 2505.20259 • Published May 26 • 24

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 36

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19 • 46

upvoted an article 4 months ago

Article

Accelerating LLM Inference: Fast Sampling with Gumbel-Max Trick

By

•

Oct 24, 2024

• 12

upvoted 2 papers 4 months ago

Could Thinking Multilingually Empower LLM Reasoning?

Paper • 2504.11833 • Published Apr 16 • 29

FlowReasoner: Reinforcing Query-Level Meta-Agents

Paper • 2504.15257 • Published Apr 21 • 47

upvoted a collection 4 months ago

NoisyRollout

8 items • Updated May 20 • 6

upvoted a paper 4 months ago

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Paper • 2504.13055 • Published Apr 17 • 19

upvoted a collection 4 months ago

🚀 Active PRM

Efficient Process Reward Model Training via Active Learning. • 4 items • Updated Apr 16 • 3

upvoted a paper 4 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 57

upvoted a collection 4 months ago

🌾Oat-Zero: Understanding R1-Zero-Like Training

5 items • Updated Apr 10 • 7

upvoted a paper 4 months ago

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14 • 13

upvoted 2 articles 6 months ago

Article

双流并行(DualPipe) 没有双流会更好

By

•

Feb 28

• 7

Article

DualPipe could be better without the Dual

By

•

Feb 28

• 17

upvoted a paper 6 months ago

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published Feb 18 • 18

upvoted a paper 7 months ago

Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models

Paper • 2412.05939 • Published Dec 8, 2024 • 16

upvoted a collection 9 months ago

Sailor2 Benchmarks

1 item • Updated Dec 3, 2024 • 2