Yi Cui

onekq

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

Organizations

MLX Community's profile picture ONEKQ AI's profile picture

onekq's activity

posted an update 6 days ago
reacted to AtAndDev's post with πŸ€— 7 days ago
view post
Post
2667
deepseek-ai/DeepSeek-R1-0528

This is the end
  • 1 reply
Β·
posted an update 8 days ago
view post
Post
329
I'm now testing the new πŸ‹DeepSeekπŸ‹ R1 and like all reasoning models, it's awfully slow. 🐒🐒

I don't expect it to break SOTA. In fact, it will be a win if it beats the old R1, which already stands very high in the leaderboard.

onekq-ai/WebApp1K-models-leaderboard

IMO the world needs a better vanilla LLM, e.g. πŸ‹DeepSeekπŸ‹ v4 or v3.5, which we will use in daily life. That's the direction Gemini Flash took which I praised.
reacted to clem's post with πŸ€— 12 days ago
view post
Post
3219
It's just become easier to share your apps on the biggest AI app store (aka HF spaces) for unlimited storage, more visibility and community interactions.

Just pick a React, Svelte, or Vue template when you create your space or add app_build_command: npm run build in your README's YAML and app_file: build/index.html in your README's YAML block.

Or follow this link: https://huggingface.co/new-space?sdk=static

Let's build!
  • 1 reply
Β·
posted an update 14 days ago
view post
Post
2193
πŸŽ‰πŸ₯³ SOTA!!! πŸš€πŸ‘‘

πŸ₯‡ Claude 4 Opus !!πŸ₯‡

7 months!! βŒ›βŒ›

I thought the day would never come. But here it is.

onekq-ai/WebApp1K-models-leaderboard

Cost me quite a bit of πŸ’΅money πŸ’΅ but it is all worth it.

Enjoy and make out of this as much as you can!
  • 4 replies
Β·
posted an update 16 days ago
view post
Post
2192
Highly recommend the latest Gemini Flash. My favorite Google I/O gift. It ranks behind reasoning models but runs a lot faster than them. It beats DeepSeek v3.

onekq-ai/WebApp1K-models-leaderboard

Reasoning is good for coding, but not mandatory.
  • 1 reply
Β·
reacted to ProCreations's post with πŸ€— 19 days ago
view post
Post
3180
Eyyy thank you guys for 40 followers!
posted an update 20 days ago
posted an update 23 days ago
view post
Post
943
This paper introduced the notion of "Tests as Prompt". It compiled results and findings of WebApp1K published in previous three papers.

https://huggingface.co/papers?q=2505.09027

The central argument here is that test-driven development is a natural fit to LLMs, which scale better than humans. I bet the future will see thousands of such leaderboards (many more proprietary ones), each dominated by a specialized model.
reacted to clem's post with πŸ”₯ 24 days ago
view post
Post
3132
Very cool to see pytorch contributing on Hugging Face. Time to follow them to see what they're cooking!
  • 2 replies
Β·
posted an update 25 days ago
view post
Post
460
If you also tuned into Altman's second congress hearing (first in 2023) along with other AI executives, my takeaway is two words: New Deal (by FDR almost a century ago).

The causal link is quite fascinating and worthy of a few blogposts or deep research queries, but I won't have more time for this (I really wish so), so here goes.

* AI workload loves GPUs because they allocate more transistors than CPUs for computing, and pack them by high-bandwidth memory
* More computing in the small physical space -> more power draw and more heat dissipation
* more heat dissipation -> liquid cooling
* new cooling and heavier power draw -> bigger racks (heavier and taller)
* bigger racks -> (re)building data centers
* new data centers with higher power demand (peak and stability) -> grid update and nuclear power
posted an update 28 days ago
view post
Post
2279
The new Mistral medium model is very impressive for its size. Will it be open sourced given the history of Mistral? Does anyone have insights?

onekq-ai/WebApp1K-models-leaderboard
posted an update 29 days ago
view post
Post
3279
This time Gemini is very quick with API support on its 2.5 pro May release. The performance is impressive too, now it is among top contenders like o4, R1, and Claude.

onekq-ai/WebApp1K-models-leaderboard
replied to clem's post about 1 month ago
reacted to clem's post with ❀️ about 1 month ago
view post
Post
4061
What are you using to evaluate models or AI systems? So far we're building lighteval & leaderboards on the hub but still feels early & a lot more to build. What would be useful to you?
Β·
posted an update about 1 month ago
replied to their post about 1 month ago
view reply

yes yes.

Maybe you can run a leaderboard of models indexed by freedom πŸ€—

posted an update about 1 month ago
view post
Post
1753
I didn't noticed that Gemini 2.5 (pro and flash) has been silently launched for API preview. Their performance is solid, but below QwQ 32B and the latest DeepSeek v3.

onekq-ai/WebApp1K-models-leaderboard
  • 2 replies
Β·
replied to their post about 1 month ago
view reply

I doubted there will be a Qwen3-coder. The direction changed. Alibaba is a corporation. You can imagine the number of executive sponsors for this release. Stock performance is at stake now. Price of success.

replied to their post about 1 month ago
view reply

You meant the non-thinking mode? If so, add /no_think in your prompt