Warning: this model (state) is simply a plugin for the RWKV-6-7B-v2.1 model, therefore it cannot be run independently.

Please download the corresponding model at: https://huggingface.co/BlinkDL/rwkv-6-world/blob/main/RWKV-x060-World-7B-v2.1-20240507-ctx4096.pth

To understand how RWKV states work, see https://zhuanlan.zhihu.com/p/695005541 (written in Chinese).

Some code to load this state:

# please use with https://github.com/BlinkDL/ChatRWKV/blob/main/API_DEMO_CHAT.py and paste this code at line 59.
state = [None] * args.n_layer * 3
state_raw = torch.load("/where/you/put/bo.pth")
for i in range(args.n_layer):
    dd = model.strategy[i]
    dev = dd.device
    atype = dd.atype    
    state[i*3+0] = torch.zeros(args.n_embd, dtype=atype, requires_grad=False, device=dev).contiguous()
    state[i*3+1] = state_raw[f'blocks.{i}.att.time_state'].transpose(1,2).to(dtype=torch.float, device=dev).requires_grad_(False).contiguous()
    state[i*3+2] = torch.zeros(args.n_embd, dtype=atype, requires_grad=False, device=dev).contiguous()

Training details:

The training data consists of Bo Peng's Zhihu answers and articles (https://www.zhihu.com/people/bopengbopeng) since April 2022, as well as a tiny part of general AI conversation data (Ruozhiba, Alpaca, RWKV - self cognition, etc).

This state is trained via (https://github.com/JL-er/RWKV-PEFT) for one epoch.

Chat example:

User: RWKV-7会比Mamba强吗?

Assistant: 我有一个朋友做了一个RWKV-7的Demo,测试了Mamba在RWKV-7的对线能力。结果如下表,可见RWKV-7更强。
另外,现在大家都炼多模态数据,包括动态去读文档。但是,这只适合长期记忆。而我认为,语言的精髓是对话(它可以代替各种搜索),因此把对话作为语料训练模型:
将所有互联网消息(包括公司公告、论文、博客、Github issue等等)输入模型。
然后让模型自动生成新的内容(注意这里不包括个人邮件和聊天记录等等),并且保证每次有新内容。
最后把所有互联网消息整合为一个大型知识库(类似维基百科),并且保证每条信息只出现一次。
然后让模型自动推演所有知识库的内容,看看哪些是矛盾的。如果没矛盾,输出;如果有矛盾,就让模型更新。
再用同样方法找到所有已经被删除的信息(包括删除了其中一条)。然后让模型重新生成该信息。
这样就变成了全部从互联网中学习,无任何漏洞。相当于永远在互联网上生存。
那么这个方案会彻底淘汰Mamba吗?当然不会。Mamba很强,尤其在小而细的细分领域(例如通过学习2023年前的所有数据可以强于99%论文作者),但是,对于大规模语言建模任务仍然只能压箱底。
目前唯一能和ChatGPT竞争的组队是OpenAI 和 DeepMind 的全部大科学家(因为他们都在挑RWKV六层魔法)。
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.