onekq-ai/WebApp1K-models-leaderboard
Yi Cui
AI & ML interests
Recent Activity
Organizations
onekq's activity

onekq-ai/WebApp1K-models-leaderboard

yes yes.
Maybe you can run a leaderboard of models indexed by freedom ๐ค

onekq-ai/WebApp1K-models-leaderboard

I doubted there will be a Qwen3-coder. The direction changed. Alibaba is a corporation. You can imagine the number of executive sponsors for this release. Stock performance is at stake now. Price of success.

You meant the non-thinking mode? If so, add /no_think in your prompt

Noted. It thinks too long which is the problem. R1 and QwQ also took longer but are acceptable.
When I tested Qwen3, the difference of two modes is between an hour and a day (maybe longer)

+1

onekq-ai/WebApp1K-models-leaderboard
I used non-thinking mode because the thinking mode is too slow ๐ข๐ข๐ข to be usable in any way.
Sigh ...

โ๏ธ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo
I experimented with GRPO lately.
I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.
After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game...
I wanted a different challenge, like ๐๐ฒ๐ฎ๐ฐ๐ต๐ถ๐ป๐ด ๐ฎ ๐บ๐ผ๐ฑ๐ฒ๐น ๐๐ผ ๐ฐ๐ฟ๐ฒ๐ฎ๐๐ฒ ๐ฎ ๐๐ฐ๐ต๐ฒ๐ฑ๐๐น๐ฒ ๐ณ๐ฟ๐ผ๐บ ๐ฎ ๐น๐ถ๐๐ ๐ผ๐ณ ๐ฒ๐๐ฒ๐ป๐๐ ๐ฎ๐ป๐ฑ ๐ฝ๐ฟ๐ถ๐ผ๐ฟ๐ถ๐๐ถ๐ฒ๐.
Choosing an original problem forced me to:
๐ค Think about the problem setting
๐งฌ Generate data
๐ค Choose the right base model
๐ Design reward functions (and experiencing reward hacking)
๐ Run multiple rounds of training, hoping that my model would learn something.
A fun and rewarding ๐ experience.
I learned a lot of things, that I want to share with you. ๐
โ๏ธ Blog post: https://huggingface.co/blog/anakin87/qwen-scheduler-grpo
๐ป Code: https://github.com/anakin87/qwen-scheduler-grpo
๐ค Hugging Face collection (dataset and model): anakin87/qwen-scheduler-grpo-680bcc583e817390525a8837

Ah thanks! this works

I heard it is able to switch between reasoning and non-reasoning, but for my question, it always goes straight to the reasoning mode without an override switch. I tried Fireworks, DeepInfra, and OpenRouter, and they are all the same.
What is your experience with Qwen3?

The dataset simulates the discovery phase of a fictitious VC firm called Reasoned Capital and, once expanded, can be used to create models which are able to make complex, subjective financial decisions based on different criteria.
The generation process encourages recursive problem-solving in increasingly complex prompts to encourage models to assess and reevaluate the conclusions and generated opinions of upstream models. Pretty neat stuff, and I'm not aware of this architecture being used in a reasoning context anywhere else.
Check it out: ZennyKenny/synthetic_vc_financial_decisions_reasoning_dataset

Qwen/Qwen3-235B-A22B
Qwen/Qwen3-30B-A3B


nice. what model is behind it.

What it does:
Paste your code (PHP, JavaScript, HTML, SQL, and more)
Get AI-generated bug reports and improvement suggestions
No sign-up, no tracking โ each result link expires in 24 hours
Why we built it: Every developer hits walls. Whether you're stuck on a syntax bug or need another set of eyes, CodeDebugger.ai offers instant feedback powered by OpenAI models โ all without compromising your privacy.
Privacy-first by design:
No login required
Code is deleted after 24 hours
No analytics, no tracking, no cookies
Try it now:
https://CodeDebugger.ai
That's a great start. Also see my post https://huggingface.co/posts/onekq/992154552707771
The point is models are already embedded and inferenced everywhere. If you standardize energy consumption as a inference output field, your impact is multiplied instantly because the groundwork is already there.
In theory, this works for closed-source model too.

(1) Models are used EVERYWHERE, customer facing and internal support, etc.
(2) A successful application must improve one of the following: revenue (๐ต๐ต), cost (๐ต๐ต), CSAT (still ๐ต๐ต)
(3) They proactively search on ๐คHF๐ค for models and use them. Open source models (especially small ones) can flexibly fit into their existing workflows/infras, which enable them to deliver, and fast.
(4) The main barrier for adoption is license. A director told me they picked a model and finetuned it, then learned they would have to share enhancements. As a result, they dropped this model and the million dollar impact went to another model.
So to fellow model builders:
(1) celebrate that our work is useful and generate lots of values
(2) make your license permissive if you want maximum impact