Yi Cui

onekq

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

Organizations

MLX Community's profile picture ONEKQ AI's profile picture

onekq's activity

posted an update 1 day ago
replied to their post 3 days ago
view reply

yes yes.

Maybe you can run a leaderboard of models indexed by freedom ๐Ÿค—

posted an update 3 days ago
view post
Post
1601
I didn't noticed that Gemini 2.5 (pro and flash) has been silently launched for API preview. Their performance is solid, but below QwQ 32B and the latest DeepSeek v3.

onekq-ai/WebApp1K-models-leaderboard
  • 2 replies
ยท
replied to their post 4 days ago
view reply

I doubted there will be a Qwen3-coder. The direction changed. Alibaba is a corporation. You can imagine the number of executive sponsors for this release. Stock performance is at stake now. Price of success.

replied to their post 4 days ago
view reply

You meant the non-thinking mode? If so, add /no_think in your prompt

replied to their post 4 days ago
view reply

Noted. It thinks too long which is the problem. R1 and QwQ also took longer but are acceptable.

When I tested Qwen3, the difference of two modes is between an hour and a day (maybe longer)

replied to their post 4 days ago
posted an update 5 days ago
view post
Post
1740
I tested Qwen3 235b and 32b and they are both worse than Qwen2.5 32b.
onekq-ai/WebApp1K-models-leaderboard

I used non-thinking mode because the thinking mode is too slow ๐Ÿข๐Ÿข๐Ÿข to be usable in any way.

Sigh ...
ยท
reacted to anakin87's post with ๐Ÿ‘ 5 days ago
view post
Post
3215
๐—œ ๐˜๐—ฟ๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฑ ๐—ฎ ๐—Ÿ๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐— ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜๐—ผ ๐˜€๐—ฐ๐—ต๐—ฒ๐—ฑ๐˜‚๐—น๐—ฒ ๐—ฒ๐˜ƒ๐—ฒ๐—ป๐˜๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐—š๐—ฅ๐—ฃ๐—ข! ๐Ÿ‘‘ ๐Ÿ—“๏ธ

โœ๏ธ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo

I experimented with GRPO lately.

I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.

After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game...

I wanted a different challenge, like ๐˜๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ด ๐—ฎ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜๐—ผ ๐—ฐ๐—ฟ๐—ฒ๐—ฎ๐˜๐—ฒ ๐—ฎ ๐˜€๐—ฐ๐—ต๐—ฒ๐—ฑ๐˜‚๐—น๐—ฒ ๐—ณ๐—ฟ๐—ผ๐—บ ๐—ฎ ๐—น๐—ถ๐˜€๐˜ ๐—ผ๐—ณ ๐—ฒ๐˜ƒ๐—ฒ๐—ป๐˜๐˜€ ๐—ฎ๐—ป๐—ฑ ๐—ฝ๐—ฟ๐—ถ๐—ผ๐—ฟ๐—ถ๐˜๐—ถ๐—ฒ๐˜€.

Choosing an original problem forced me to:
๐Ÿค” Think about the problem setting
๐Ÿงฌ Generate data
๐Ÿค Choose the right base model
๐Ÿ† Design reward functions (and experiencing reward hacking)
๐Ÿ”„ Run multiple rounds of training, hoping that my model would learn something.

A fun and rewarding ๐Ÿ˜„ experience.


I learned a lot of things, that I want to share with you. ๐Ÿ‘‡
โœ๏ธ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo
๐Ÿ’ป Code: https://github.com/anakin87/qwen-scheduler-grpo
๐Ÿค— Hugging Face collection (dataset and model): anakin87/qwen-scheduler-grpo-680bcc583e817390525a8837
  • 2 replies
ยท
replied to their post 5 days ago
posted an update 6 days ago
view post
Post
470
The Qwen3 235B (MoE) is awfully slow ๐Ÿข๐Ÿข๐Ÿข.

I heard it is able to switch between reasoning and non-reasoning, but for my question, it always goes straight to the reasoning mode without an override switch. I tried Fireworks, DeepInfra, and OpenRouter, and they are all the same.

What is your experience with Qwen3?
  • 2 replies
ยท
reacted to ZennyKenny's post with ๐Ÿ‘ 6 days ago
view post
Post
2695
I've created a new dataset using the Algorithm of Thoughts architecture proposed by Sel et al. (2023) in a reasoning context. (paper: https://arxiv.org/pdf/2308.10379)

The dataset simulates the discovery phase of a fictitious VC firm called Reasoned Capital and, once expanded, can be used to create models which are able to make complex, subjective financial decisions based on different criteria.

The generation process encourages recursive problem-solving in increasingly complex prompts to encourage models to assess and reevaluate the conclusions and generated opinions of upstream models. Pretty neat stuff, and I'm not aware of this architecture being used in a reasoning context anywhere else.

Check it out: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset
posted an update 7 days ago
replied to CadenHolman's post 11 days ago
reacted to CadenHolman's post with ๐Ÿ‘€ 11 days ago
view post
Post
1801
Weโ€™re excited to launch CodeDebugger.ai, a free, privacy-first tool that helps developers debug code instantly using AI.

What it does:

Paste your code (PHP, JavaScript, HTML, SQL, and more)

Get AI-generated bug reports and improvement suggestions

No sign-up, no tracking โ€” each result link expires in 24 hours

Why we built it: Every developer hits walls. Whether you're stuck on a syntax bug or need another set of eyes, CodeDebugger.ai offers instant feedback powered by OpenAI models โ€” all without compromising your privacy.

Privacy-first by design:

No login required

Code is deleted after 24 hours

No analytics, no tracking, no cookies

Try it now:
https://CodeDebugger.ai
  • 2 replies
ยท
replied to clem's post 12 days ago
posted an update 12 days ago
view post
Post
2003
I've recently attended a panel on AI applications. The panelists are managers/directors of Fortune 500 companies. These people make things happen and own results, so their stories and pain points are fresh.

(1) Models are used EVERYWHERE, customer facing and internal support, etc.
(2) A successful application must improve one of the following: revenue (๐Ÿ’ต๐Ÿ’ต), cost (๐Ÿ’ต๐Ÿ’ต), CSAT (still ๐Ÿ’ต๐Ÿ’ต)
(3) They proactively search on ๐Ÿค—HF๐Ÿค— for models and use them. Open source models (especially small ones) can flexibly fit into their existing workflows/infras, which enable them to deliver, and fast.
(4) The main barrier for adoption is license. A director told me they picked a model and finetuned it, then learned they would have to share enhancements. As a result, they dropped this model and the million dollar impact went to another model.

So to fellow model builders:
(1) celebrate that our work is useful and generate lots of values
(2) make your license permissive if you want maximum impact
  • 1 reply
ยท