Will this have better performance
Compute has gotten much better since you did this. My idea is:
Merge together Deepseek R1 0528, make it ~20-24T params, run it on cerebras cs3 (when they give me access), then tune it on a couple million rows so it becomes better than the original deepseek, but will it work, or is it a big fat mess? Have you run this yourself and had better results?
That;s a good question. For now it;s experimental frankenmerge, but with better hardware that would be an interesting experiment =)
I tried to run it, but since the only place that i could run it was server in school, I cant really run it properly (rip, they took it away and now its just sitting there), so maybe later
so, you can't really know for sure until you try it... Waiting to get access to that cs3!