Millions of States: Designing a Scalable MoE Architecture with RWKV-7 Meta-learner Paper • 2504.08247 • Published Apr 11
AlphaGaO/DeepSeek-V3-0324-Fused-4E-29B-Unhealed-Preview Text Generation • 29B • Updated Apr 8 • 16 • 2
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling Paper • 2503.06121 • Published Mar 8 • 5
BlackGoose Rimer: Harnessing RWKV-7 as a Simple yet Superior Replacement for Transformers in Large-Scale Time Series Modeling Paper • 2503.06121 • Published Mar 8 • 5 • 2
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published Jan 26 • 25
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer Paper • 2501.15570 • Published Jan 26 • 25 • 2