Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner Paper • 2504.08247 • Published Apr 11
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published Jan 26 • 25