7 36 32

Mingzhe Du PRO

Elfsong

https://mingzhe.space

Elfsong

AI & ML interests

Code Generation / Preference Alignment / Bias Mitigation

Recent Activity

updated a dataset about 16 hours ago

Elfsong/Arabic_Dialect_DPO

published a dataset about 16 hours ago

Elfsong/Arabic_Dialect_DPO

liked a dataset 1 day ago

taidng/UIT-ViQuAD2.0

View all activity

Organizations

upvoted a paper 1 day ago

VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models

Paper • 2305.12199 • Published May 20, 2023 • 1

upvoted 2 papers about 2 months ago

WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Paper • 2511.09515 • Published Nov 12, 2025 • 18

Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads

Paper • 2511.06209 • Published Nov 9, 2025 • 18

upvoted 3 papers 3 months ago

Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs

Paper • 2507.09477 • Published Jul 13, 2025 • 86

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2, 2025 • 80

LongCodeZip: Compress Long Context for Code Language Models

Paper • 2510.00446 • Published Oct 1, 2025 • 106

upvoted 3 papers 4 months ago

Reverse-Engineered Reasoning for Open-Ended Generation

Paper • 2509.06160 • Published Sep 7, 2025 • 150

SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?

Paper • 2502.12115 • Published Feb 17, 2025 • 46

Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19, 2025 • 118

upvoted a paper 5 months ago

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129

upvoted an article 5 months ago

Article

Introducing smolagents: simple agents that write actions in code.

Dec 31, 2024

•

1.16k

upvoted 4 papers 5 months ago

upvoted a paper 6 months ago

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper • 2507.12415 • Published Jul 16, 2025 • 42

upvoted an article 6 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7, 2025

•

267

upvoted 3 papers 7 months ago

BENCHAGENTS: Automated Benchmark Creation with Agent Interaction

Paper • 2410.22584 • Published Oct 29, 2024 • 1

Charting and Navigating Hugging Face's Model Atlas

Paper • 2503.10633 • Published Mar 13, 2025 • 92

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 77