Thanks for not grossly overfitting this model.
🚀
❤️
7
1
#4 opened 11 days ago
by
phil111
Hi, could you consider training a 34b model using the rwkv architecture and compare it with Transformers + Mamba?
1
#3 opened 12 days ago
by
win10

Could you please tell me which 18 languages you mainly support?
1
#2 opened 15 days ago
by
FantastyZhou
Was qwen 3 tested with thinking on or off?
2
#1 opened 15 days ago
by
drmcbride