https://lukaspetersson.com/blog/2025/bitter-vertical/
Yi Cui
AI & ML interests
Recent Activity
Organizations
onekq's activity

https://lukaspetersson.com/blog/2025/bitter-vertical/

onekq-ai/WebApp1K-models-leaderboard

https://en.wikipedia.org/wiki/Hangzhou


Yes! I'm looking forward to R2

I tested and ranked every model drop for my leaderboard https://huggingface.co/spaces/onekq-ai/WebApp1K-models-leaderboard but this time I gave up.
Whatever questions this model aims to solve, they are out of my league.

You can infer on CPU, but it will be very very slow π

Like all LLMs, coding models hallucinate too. The wrong answers they give are only inches away from the right answers. In case of SQL, the code is not only presentable, but also executable, hence returning the wrong rows.
I'm clueless, and curious how users will deal with this.

The model family includes 7B and 32B,
onekq-ai/onesql-v01-qwen-67d8e3eb1611c5532bb90c5f
and can be also found on Ollama (https://ollama.com/onekq/OneSQL-v0.1-Qwen)
My goal is to make OneSQL the most usable open-weights model for text-to-SQL. I'm currently working on best practices to help users use this model the right away and avoid pitfalls. After that, I plan to train the next version to push for a higher EX score.
Enjoy this model and feel free to share comments/questions π€

Welcome to my world π

Question for the Llama team: what will be your play? π

Not implying. I like to know what the base is. If QwQ and DeepSeek distill use the same base, then it becomes more puzzling why the performance differ so much.

Ah I see. Thanks!
Still the blogpost didn't mention what the base model is (if any).

Cool! I will check it out.
What I meant by switching is this. Sometimes I'm not satisfied with ChatGPT answer, and realized it needs to think harder. So I switched to o1 and asked again, and most of the times the answer gets better. Then I asked a simple follow-up question which o1 overanalyzed. Then I had to switch back to gpt-4o. I don't actually have the foresight which model fits my question the best. I only know it after I read the answer which is too late.
Now imagine a conversation with a human expert. A human can do such switching remarkably well, hence a cool conversation. This can be actually a metric to read the mileage of an applicant.

This makes it particularly mysterious what went into QwQ-32B? Why did it work so well? Was it trained from scratch? Anyone has insights about this?
onekq-ai/WebApp1K-models-leaderboard

We now have powerful models capable of either system I thinking or system II thinking, but not both, much less switching between the two. But humans can do this quite easily.
ChatGPT and others push the burden to users to switch between models. I guess this is the best we have now.