jzwong
jzwong
·
AI & ML interests
None yet
Recent Activity
updated
a collection
10 days ago
Novel
updated
a collection
3 months ago
SYS
updated
a collection
3 months ago
LLM-RL
Organizations
None yet
MLLM
LLM-RL
-
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 137 -
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Paper • 2503.16219 • Published • 51 -
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 63
Novel
Survey
-
Taming the Titans: A Survey of Efficient LLM Inference Serving
Paper • 2504.19720 • Published • 12 -
Thus Spake Long-Context Large Language Model
Paper • 2502.17129 • Published • 73 -
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Paper • 2407.16216 • Published -
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
Paper • 2505.00551 • Published • 37
Sparse
LLM
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 100 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 64 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 73
Agent-RL
-
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61 -
ToolRL: Reward is All Tool Learning Needs
Paper • 2504.13958 • Published • 46 -
OTC: Optimal Tool Calls via Reinforcement Learning
Paper • 2504.14870 • Published • 33 -
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Paper • 2504.21776 • Published • 59
SYS
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Paper • 2501.11873 • Published • 67 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 -
MoBA: Mixture of Block Attention for Long-Context LLMs
Paper • 2502.13189 • Published • 17
AIGC
Sparse
MLLM
LLM
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 100 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 298 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 64 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 73
LLM-RL
-
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 137 -
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
Paper • 2503.16219 • Published • 51 -
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 63
Agent-RL
-
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61 -
ToolRL: Reward is All Tool Learning Needs
Paper • 2504.13958 • Published • 46 -
OTC: Optimal Tool Calls via Reinforcement Learning
Paper • 2504.14870 • Published • 33 -
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
Paper • 2504.21776 • Published • 59
Novel
SYS
-
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 89 -
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
Paper • 2501.11873 • Published • 67 -
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Paper • 2502.11089 • Published • 165 -
MoBA: Mixture of Block Attention for Long-Context LLMs
Paper • 2502.13189 • Published • 17
Survey
-
Taming the Titans: A Survey of Efficient LLM Inference Serving
Paper • 2504.19720 • Published • 12 -
Thus Spake Long-Context Large Language Model
Paper • 2502.17129 • Published • 73 -
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More
Paper • 2407.16216 • Published -
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
Paper • 2505.00551 • Published • 37