Model collapse after SFT
#14 opened about 3 hours ago
by
Banjiuyufen

Vocab missing tool-related strings in chat template, poor performance with tools
#13 opened about 19 hours ago
by
mattjcly
Can you please release how you post-train qwen3 on deepseek?
2
#12 opened 5 days ago
by
ZeroWw
Tried it, but not good as expected.
3
#11 opened 5 days ago
by
kk3dmax
/no_think 标签不能用了吗
4
#10 opened 5 days ago
by
loong
Any plans for a Qwen3-32B model?
👍
11
7
#9 opened 5 days ago
by
wanghf
BTW For programmer, `Gemma` series are best to help you write comments, docstrings, and documents.
#8 opened 5 days ago
by
DOFOFFICIAL

DeepSeek-R1-Lite
❤️
🔥
18
6
#6 opened 6 days ago
by
Dampfinchen
generation_config.json is missing
👀
1
#5 opened 6 days ago
by
Doctor-Chad-PhD

Model broken
👍
3
8
#4 opened 6 days ago
by
sm54
Any plans on gemma series? ;-;
❤️
4
4
#2 opened 6 days ago
by
Nakdesu

Any plans on 30B-A3B model?
🔥
29
7
#1 opened 6 days ago
by
xxx777xxxASD
