arxiv:2503.14456

RWKV-7 "Goose" with Expressive Dynamic State Evolution

Published on Mar 18

· Submitted by

ZhangRC on Mar 19

#1 Paper of the day

Upvote

Authors:

Bo Peng ,

Ruichong Zhang ,

Daniel Goldstein ,

Guangyu Song ,

Saiteja Utpala ,

Nathan Wilce ,

Daniel Wuttke ,

Christian Zhou-Zheng

Abstract

We present RWKV-7 "Goose", a new sequence modeling architecture, along with pre-trained language models that establish a new state-of-the-art in downstream performance at the 3 billion parameter scale on multilingual tasks, and match current SoTA English language performance despite being trained on dramatically fewer tokens than other top 3B models. Nevertheless, RWKV-7 models require only constant memory usage and constant inference time per token. RWKV-7 introduces a newly generalized formulation of the delta rule with vector-valued gating and in-context learning rates, as well as a relaxed value replacement rule. We show that RWKV-7 can perform state tracking and recognize all regular languages, while retaining parallelizability of training. This exceeds the capabilities of Transformers under standard complexity conjectures, which are limited to TC^0. To demonstrate RWKV-7's language modeling capability, we also present an extended open source 3.1 trillion token multilingual corpus, and train four RWKV-7 models ranging from 0.19 billion to 2.9 billion parameters on this dataset. To foster openness, reproduction, and adoption, we release our models and dataset component listing at https://huggingface.co/RWKV, and our training and inference code at https://github.com/RWKV/RWKV-LM all under the Apache 2.0 License.

View arXiv page View PDF Project page GitHub repository Add to collection

Community

ZhangRC

Paper author Paper submitter about 21 hours ago

RWKV-7 paper is finally out!

dont-worry-1

about 20 hours ago

cant wait to see this paired with CoT-type reasoning

BlinkDL

Paper author about 19 hours ago

RWKV7-G1 "GooseOne" reasoning models :)
https://x.com/BlinkDL_AI/status/1898579674575552558

wuhanluojia

about 16 hours ago

可并行训练+
边推理边学习（推理速度显存都恒定）+
无限上下文+
无限CoT+
想知道上述无限具体指的是什么，比如具体应用的时候无限上下文？无限cot？

ZhangRC

Paper author Paper submitter about 15 hours ago

•

edited about 13 hours ago

可并行训练+
边推理边学习（推理速度显存都恒定）+
无限上下文+
无限CoT+
想知道上述无限具体指的是什么，比如具体应用的时候无限上下文？无限cot？

This claim about "infinite context" and "infinite CoT" is slightly overclaimed. This isn't from the authors but probably from a product manager.
However, RWKV-7 as an RNN can accept "indefinite" context length. It can accept arbitrarily long input but still suffer from forgetting. This is different from "infinite", since RNNs may still forget certain information in a very lengthy context.
So, "无限" means indefinite.
For example, RWKV-7 models trained with context length 4k can retain perfect memory until around 8k-16k, after that, they may drastically forget previous information. If necessary, you can fine-tune RWKV-7 models with arbitrarily long context length. Please test specific use cases for clarity.

m8than

Paper author about 15 hours ago

New paper just dropped 👌👌

Jellyfish042

about 13 hours ago

Transformers are toast this time for real

ZhangRC

Paper author Paper submitter about 8 hours ago

The RWKV community features diverse backgrounds and expertise. This time, most of our authors come from fields outside traditional AI, including physics, mathematics, game development, material science, biochemistry and cybernetics.
Join our community at https://discord.com/invite/bDSBUMeFpc and have a nice journey!