Model collapse after SFT
#14 opened 2 days ago
by
Banjiuyufen

Vocab missing tool-related strings in chat template, poor performance with tools
#13 opened 3 days ago
by
mattjcly
Can you please release how you post-train qwen3 on deepseek?
2
#12 opened 7 days ago
by
ZeroWw
Tried it, but not good as expected.
3
#11 opened 7 days ago
by
kk3dmax
/no_think 标签不能用了吗
4
#10 opened 7 days ago
by
loong
Any plans for a Qwen3-32B model?
👍
13
7
#9 opened 7 days ago
by
wanghf
BTW For programmer, `Gemma` series are best to help you write comments, docstrings, and documents.
#8 opened 7 days ago
by
DOFOFFICIAL

DeepSeek-R1-Lite
❤️
🚀
19
7
#6 opened 7 days ago
by
Dampfinchen
generation_config.json is missing
👍
👀
2
#5 opened 7 days ago
by
Doctor-Chad-PhD

Model broken
👍
3
8
#4 opened 7 days ago
by
sm54
Any plans on gemma series? ;-;
❤️
4
4
#2 opened 8 days ago
by
Nakdesu

Any plans on 30B-A3B model?
🔥
30
7
#1 opened 8 days ago
by
xxx777xxxASD
