Yi Cui's picture

Yi Cui

onekq

·

https://onekq.ai

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

posted an update about 19 hours ago

Okay, Qwen3 coder does much better than Qwen3 (coding model for coding), but GPT OSS still maintains SOTA for open source models. https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard

updated a Space about 19 hours ago

onekq-ai/WebApp1K-models-leaderboard

posted an update 3 days ago

Kimi K2 is a bit disappointing by my expectations. It is on a par with Codex mini. https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard

View all activity

Organizations

authored a paper 6 months ago

Tests as Prompt: A Test-Driven-Development Benchmark for LLM Code Generation

Paper • 2505.09027 • Published May 13

authored 3 papers about 1 year ago

A Case Study of Web App Coding with OpenAI Reasoning Models

Paper • 2409.13773 • Published Sep 19, 2024 • 6

WebApp1K: A Practical Code-Generation Benchmark for Web App Development

Paper • 2408.00019 • Published Jul 30, 2024 • 1

Insights from Benchmarking Frontier Language Models on Web App Code Generation

Paper • 2409.05177 • Published Sep 8, 2024 • 7