Yi Cui
AI & ML interests
Recent Activity
Organizations
onekq's activity


I don't expect it to break SOTA. In fact, it will be a win if it beats the old R1, which already stands very high in the leaderboard.
onekq-ai/WebApp1K-models-leaderboard
IMO the world needs a better vanilla LLM, e.g. πDeepSeekπ v4 or v3.5, which we will use in daily life. That's the direction Gemini Flash took which I praised.

Just pick a React, Svelte, or Vue template when you create your space or add
app_build_command: npm run build
in your README's YAML and app_file: build/index.html
in your README's YAML block.Or follow this link: https://huggingface.co/new-space?sdk=static
Let's build!

π₯ Claude 4 Opus !!π₯
7 months!! ββ
I thought the day would never come. But here it is.
onekq-ai/WebApp1K-models-leaderboard
Cost me quite a bit of π΅money π΅ but it is all worth it.
Enjoy and make out of this as much as you can!

onekq-ai/WebApp1K-models-leaderboard
Reasoning is good for coding, but not mandatory.


codex-mini is a finetuned version of o4-mini, but on my leaderboard it performs worse than its base.
onekq-ai/WebApp1K-models-leaderboard

https://huggingface.co/papers?q=2505.09027
The central argument here is that test-driven development is a natural fit to LLMs, which scale better than humans. I bet the future will see thousands of such leaderboards (many more proprietary ones), each dominated by a specialized model.


The causal link is quite fascinating and worthy of a few blogposts or deep research queries, but I won't have more time for this (I really wish so), so here goes.
* AI workload loves GPUs because they allocate more transistors than CPUs for computing, and pack them by high-bandwidth memory
* More computing in the small physical space -> more power draw and more heat dissipation
* more heat dissipation -> liquid cooling
* new cooling and heavier power draw -> bigger racks (heavier and taller)
* bigger racks -> (re)building data centers
* new data centers with higher power demand (peak and stability) -> grid update and nuclear power

onekq-ai/WebApp1K-models-leaderboard

onekq-ai/WebApp1K-models-leaderboard
Biggest pain point is still inference providers. Even decent labs like Ai2 or THUDM need to lobby for that. My leaderboard is for web developers but I can only evaluate the most visible models with token API support. https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard
Maybe some players have GPUs but keep the results to themselves. We can only hope they will reciprocate for what they benefit from this community.

onekq-ai/WebApp1K-models-leaderboard

yes yes.
Maybe you can run a leaderboard of models indexed by freedom π€

onekq-ai/WebApp1K-models-leaderboard

I doubted there will be a Qwen3-coder. The direction changed. Alibaba is a corporation. You can imagine the number of executive sponsors for this release. Stock performance is at stake now. Price of success.

You meant the non-thinking mode? If so, add /no_think in your prompt