Sea AI Lab-Sailor

community

AI & ML interests

None defined yet.

Recent Activity

SivilTaram authored a paper about 1 month ago

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

SivilTaram authored a paper about 2 months ago

First Return, Entropy-Eliciting Explore

SivilTaram authored a paper about 2 months ago

ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention

View all activity

SivilTaram

authored a paper about 1 month ago

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16 • 41

SivilTaram

authored 2 papers about 2 months ago

First Return, Entropy-Eliciting Explore

Paper • 2507.07017 • Published Jul 9 • 23

ZeCO: Zero Communication Overhead Sequence Parallelism for Linear Attention

Paper • 2507.01004 • Published Jul 1 • 10

lkevinzc

authored a paper about 2 months ago

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30 • 48

koalazf99

authored a paper about 2 months ago

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Paper • 2506.20512 • Published Jun 25 • 46

koalazf99

authored a paper 2 months ago

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Paper • 2506.14965 • Published Jun 17 • 49

lkevinzc

authored 2 papers 3 months ago

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19 • 36

SivilTaram

authored a paper 3 months ago

General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20 • 23

dreamerdeo

authored 4 papers 4 months ago

FlowReasoner: Reinforcing Query-Level Meta-Agents

Paper • 2504.15257 • Published Apr 21 • 47

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Paper • 2504.13055 • Published Apr 17 • 19

SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types

Paper • 2412.11757 • Published Dec 16, 2024

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14 • 13

Zenithwang

authored a paper 5 months ago

COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

Paper • 2504.05535 • Published Apr 7 • 45

koalazf99

authored a paper 5 months ago

MegaMath: Pushing the Limits of Open Math Corpora

Paper • 2504.02807 • Published Apr 3 • 34

lkevinzc

authored a paper 5 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 57

Cameron-Chen

authored a paper 5 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 57

SivilTaram

authored 3 papers 5 months ago

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Paper • 2411.07763 • Published Nov 12, 2024 • 2

When Attention Sink Emerges in Language Models: An Empirical View

Paper • 2410.10781 • Published Oct 14, 2024

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published Feb 18 • 18