Xiusi Chen's picture

10 9

Xiusi Chen

XtremSup

·

https://xiusic.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

upvoted a paper 8 days ago

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

upvoted a paper 13 days ago

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

View all activity

Organizations

XtremSup's activity

upvoted a paper 4 days ago

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Paper • 2505.24846 • Published 7 days ago • 15

upvoted a paper 8 days ago

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

Paper • 2505.22961 • Published 9 days ago • 8

upvoted a paper 13 days ago

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

Paper • 2505.13508 • Published 21 days ago • 14

upvoted a collection about 1 month ago

RM-R1

RM-R1: Reward Modeling as Reasoning • 16 items • Updated 9 days ago • 7

upvoted 2 papers about 1 month ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published May 5 • 24

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5 • 75

upvoted 2 papers about 2 months ago

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21 • 33

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16 • 44

upvoted an article 3 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

By

•

Feb 11

• 41

upvoted a paper 7 months ago

SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation

Paper • 2410.14745 • Published Oct 17, 2024 • 48