Yi Cui

onekq

AI & ML interests

Benchmark, Code Generation Model

Recent Activity

Organizations

MLX Community's profile picture ONEKQ AI's profile picture

onekq's activity

posted an update 1 day ago
posted an update 2 days ago
posted an update 4 days ago
posted an update 6 days ago
view post
Post
3719
Folks, let's get ready.πŸ₯³ We will be busy soon. πŸ˜…πŸ€—https://github.com/huggingface/transformers/pull/36878
replied to their post 6 days ago
replied to their post 7 days ago
posted an update 7 days ago
view post
Post
1553
I like to benchmark πŸ’΅o1-proπŸ’΅ but it is way too expensive for me πŸ€¦β€β™‚οΈ
Β·
replied to their post 8 days ago
view reply

You can infer on CPU, but it will be very very slow πŸ˜•

posted an update 8 days ago
view post
Post
442
The majority of OneSQL downloads went to the lowest end (7B-GGUF). I didn't expect this at all. The accuracy of this variant is the lowest, as the tradeoff for its small size.

Like all LLMs, coding models hallucinate too. The wrong answers they give are only inches away from the right answers. In case of SQL, the code is not only presentable, but also executable, hence returning the wrong rows.

I'm clueless, and curious how users will deal with this.
  • 3 replies
Β·
posted an update 10 days ago
view post
Post
2265
Introducing πŸŽ‰ OneSQL-v0.1πŸ₯³, our first text-to-SQL model based on Qwen2.5-Coder. This model has achieved an EX score of 63.33 on the BIRD leaderboard (https://bird-bench.github.io/).

The model family includes 7B and 32B,
onekq-ai/onesql-v01-qwen-67d8e3eb1611c5532bb90c5f
and can be also found on Ollama (https://ollama.com/onekq/OneSQL-v0.1-Qwen)

My goal is to make OneSQL the most usable open-weights model for text-to-SQL. I'm currently working on best practices to help users use this model the right away and avoid pitfalls. After that, I plan to train the next version to push for a higher EX score.

Enjoy this model and feel free to share comments/questions πŸ€—
  • 1 reply
Β·
replied to their post 10 days ago
replied to their post 12 days ago
view reply

Question for the Llama team: what will be your play? πŸ˜…

posted an update 12 days ago
view post
Post
2471
Common formula to DIY a LLM:

Post train a Qwen model with a dataset distilled from DeepSeek πŸ˜‚

  • 2 replies
Β·
posted an update 13 days ago
replied to their post 13 days ago
view reply

Not implying. I like to know what the base is. If QwQ and DeepSeek distill use the same base, then it becomes more puzzling why the performance differ so much.

posted an update 14 days ago
view post
Post
1683
Qwen made good students, DeepSeek made a genius.

This is my summaries of their differentiations. I don't think these two players are coordinated but they both have clear goals. One is to build ecosystem and the other is to push AGI.

And IMO they are both doing really well.
  • 2 replies
Β·
replied to their post 14 days ago
view reply

Ah I see. Thanks!

Still the blogpost didn't mention what the base model is (if any).

replied to their post 15 days ago
view reply

Cool! I will check it out.

What I meant by switching is this. Sometimes I'm not satisfied with ChatGPT answer, and realized it needs to think harder. So I switched to o1 and asked again, and most of the times the answer gets better. Then I asked a simple follow-up question which o1 overanalyzed. Then I had to switch back to gpt-4o. I don't actually have the foresight which model fits my question the best. I only know it after I read the answer which is too late.

Now imagine a conversation with a human expert. A human can do such switching remarkably well, hence a cool conversation. This can be actually a metric to read the mileage of an applicant.

posted an update 16 days ago
view post
Post
1409
The performance of deepseek-r1-distill-qwen-32b is abysmal. I know Qwen instruct (not coder) is quite poor on coding. As such, I have low expectation on other R1 repro works also based on Qwen instruct too. onekq-ai/r1-reproduction-works-67a93f2fb8b21202c9eedf0b

This makes it particularly mysterious what went into QwQ-32B? Why did it work so well? Was it trained from scratch? Anyone has insights about this?
onekq-ai/WebApp1K-models-leaderboard
  • 5 replies
Β·
posted an update 17 days ago
view post
Post
748
A bigger and harder pain point for reasoning model is to switch modes.

We now have powerful models capable of either system I thinking or system II thinking, but not both, much less switching between the two. But humans can do this quite easily.

ChatGPT and others push the burden to users to switch between models. I guess this is the best we have now.
  • 2 replies
Β·