-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
Shuo Xing
shuoxing
AI & ML interests
MLLMs, LLMs
Recent Activity
updated a model 1 day ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4 published a model 1 day ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4 updated a model 1 day ago
shuoxing/llama3-8b-full-pretrain-wash-c4-4-2m-bs4Organizations
MLLM Reasoning, Rewarding, and Understanding
Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 279 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
LLM4Math
-
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Paper • 2510.04721 • Published -
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models
Paper • 2505.02735 • Published • 33 -
PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts
Paper • 2504.18428 • Published -
MathConstruct: Challenging LLM Reasoning with Constructive Proofs
Paper • 2502.10197 • Published
MLLM Reasoning, Rewarding, and Understanding
Papers on the reasoning, rewarding, and understanding of the MLLMs and LLMs
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 279 -
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Paper • 2506.02096 • Published • 52 -
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Paper • 2506.02397 • Published • 36 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 146
models 227
shuoxing/llama3-8b-full-pretrain-wash-c4-4-5m-bs4
8B • Updated • 10
shuoxing/llama3-8b-full-pretrain-wash-c4-4-2m-bs4
Text Generation • 8B • Updated • 111
shuoxing/llama3-8b-full-pretrain-wash-c4-3-9m-bs4
Text Generation • 8B • Updated • 144
shuoxing/llama3-8b-full-pretrain-wash-c4-3-6m-bs4
Text Generation • 8B • Updated • 280
shuoxing/llama3-8b-full-pretrain-wash-c4-3-3m-bs4
Text Generation • 8B • Updated • 168
shuoxing/llama3-8b-full-pretrain-wash-c4-3-0m-bs4
Text Generation • 8B • Updated • 264
shuoxing/llama3-8b-full-pretrain-wash-c4-2-7m-bs4
Text Generation • 8B • Updated • 171
shuoxing/llama3-8b-full-pretrain-wash-c4-2-4m-sft-bs64
Text Generation • 8B • Updated • 182
shuoxing/llama3-8b-full-pretrain-wash-c4-2-1m-sft-bs64
Text Generation • 8B • Updated • 198
shuoxing/llama3-8b-full-pretrain-wash-c4-1-8m-sft-bs64
Text Generation • 8B • Updated • 214
datasets 7
shuoxing/yt_ugc_public
Updated • 1.37k
shuoxing/AutoTrust
Updated • 5
shuoxing/KoNViD_1k_videos
Viewer • Updated • 1.2k • 70
shuoxing/Tweet_demo
Viewer • Updated • 100 • 10
shuoxing/MapBench_VQA
Viewer • Updated • 96 • 43 • 1
shuoxing/MapBench
Viewer • Updated • 97 • 9
shuoxing/tweet-scholar
Viewer • Updated • 95 • 9