Zeyu Qin's picture

36 37

Zeyu Qin

qqqzzzyyy

·

https://alan-qin.github.io/

Alan-Qin

AI & ML interests

Scalable Oversight, AI safety

Recent Activity

upvoted a collection 2 days ago

upvoted an article about 1 month ago

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

upvoted an article about 2 months ago

Open R1: Update #3

View all activity

Organizations

None yet

upvoted a collection 2 days ago

hahah

1 item • Updated 2 days ago • 1

upvoted an article about 1 month ago

Article

BigCodeBench: Benchmarking Large Language Models on Solving Practical and Challenging Programming Tasks

By

and 8 others •

Jun 18, 2024

• 49

upvoted an article about 2 months ago

Article

Open R1: Update #3

By

and 9 others •

Mar 11

• 293

updated a model 3 months ago

qqqzzzyyy/qwen2.5-1.5b-simple-rl-math3to5-adaptive_s4

Updated Apr 14 • 1

published a model 3 months ago

qqqzzzyyy/qwen2.5-1.5b-simple-rl-math3to5-adaptive_s4

Updated Apr 14 • 1

liked 4 datasets 3 months ago

agentica-org/DeepCoder-Preview-Dataset

Viewer • Updated Apr 9 • 25k • 2.33k • 80

AndrewZeng/math_level1to5_qwen_prompt

Viewer • Updated Apr 2 • 12k • 38 • 1

Goedel-LM/Goedel-Pset-v1

Viewer • Updated Apr 18 • 1.73M • 596 • 8

obiwan96/owm-cog-behaviors

Viewer • Updated Mar 19 • 28.3k • 94 • 2

upvoted a collection 3 months ago

Cognitive Behaviors

4 items • Updated Mar 19 • 2

liked a dataset 4 months ago

allenai/reward-bench

Viewer • Updated Sep 9, 2024 • 8.11k • 5.49k • 95

upvoted a collection 4 months ago

DeepSeek-R1

10 items • Updated about 1 month ago • 734

liked a model 4 months ago

TIGER-Lab/Qwen2.5-Math-7B-CFT

Text Generation • 8B • Updated Feb 2 • 109 • 8

liked a dataset 4 months ago

TIGER-Lab/WebInstruct-CFT

Viewer • Updated Feb 2 • 654k • 221 • 52

upvoted a collection 4 months ago

NuminaMath

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated Feb 10 • 78

liked a Space 4 months ago

README

liked a model 4 months ago

meta-llama/Llama-3.2-3B

Text Generation • 3B • Updated Oct 24, 2024 • 251k • • 590

upvoted 2 papers 4 months ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 104

Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?

Paper • 2502.12215 • Published Feb 17 • 16

upvoted a paper 5 months ago

CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction

Paper • 2502.07316 • Published Feb 11 • 50