Zeyu Qin's picture

36 37

Zeyu Qin

qqqzzzyyy

·

https://alan-qin.github.io/

Alan-Qin

AI & ML interests

Scalable Oversight, AI safety

Recent Activity

upvoted a collection 2 days ago

upvoted an article about 1 month ago

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

upvoted an article about 2 months ago

Open R1: Update #3

View all activity

Organizations

None yet

upvoted a collection 2 days ago

hahah

1 item • Updated 2 days ago • 1

upvoted an article about 1 month ago

Article

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

By

and 8 others •

Jun 18, 2024

• 49

upvoted an article about 2 months ago

Article

Open R1: Update #3

By

and 9 others •

Mar 11

• 293

upvoted a collection 3 months ago

Cognitive Behaviors

4 items • Updated Mar 19 • 2

upvoted 2 collections 4 months ago

DeepSeek-R1

10 items • Updated about 1 month ago • 734

NuminaMath

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated Feb 10 • 78

upvoted 2 papers 4 months ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 104

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

Paper • 2502.12215 • Published Feb 17 • 16

upvoted 3 papers 5 months ago

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11 • 50

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31 • 126

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23 • 48

upvoted a paper 6 months ago

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 84

upvoted 2 papers 8 months ago

AgentInstruct: Toward Generative Teaching with Agentic Flows

Paper • 2407.03502 • Published Jul 3, 2024 • 51

Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 79

upvoted a collection 9 months ago

Qwen2.5-Math

Math-specific model series based on Qwen2.5 • 11 items • Updated Apr 28 • 82

upvoted a paper 10 months ago

Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30, 2024 • 50

upvoted 2 articles 11 months ago

Article

Let's talk about LLM evaluation

By

•

May 23, 2024

• 177

Article

SmolLM - blazingly fast and remarkably powerful

By

and 2 others •

Jul 16, 2024

• 381

upvoted 2 collections 12 months ago

Qwen1.5

Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated Apr 28 • 209

Coding Instruction datasets

4 items • Updated Nov 25, 2024 • 1