Any plans on gemma series? ;-;

#2
by Nakdesu - opened

gemma has better multilinguality and vision, plus it is generally creative.

Or better yet: Release the distillation process recipe so anyone can reproduce it in other open models

Gemma has bad license. more like worst.

Gemma has bad license. more like worst.

You are not going to deploy this in commercial environment anyway. Might as well get the best. Qwen is overfried on stem and math, has zero capabilities besides those two domains.

IMO they should make their own architecture again. There's really no need to use Qwen or Gemma, DeepSeek has an incredible architecture they just have to scale it down. They did with Deepseek V2 Lite and I'm not sure why they abandoned this approach. It would also support MLA and have the same training data as the big R1 instead of just training on its outputs.

Sign up or log in to comment