Model Card for Model ID

Model Details

Model Description

Developed by: xTimeCrystal
Model type: RWKV 7 (NOTE: the decay is computed using -F.softplus instead of -0.606*torch.sigmoid, all LoRAs use Tanh, LoRA weights are stored like nn.Linear)
Language(s) (NLP): English
License: MIT

Uses

Direct Use

Fast autocomplete model.

Out-of-Scope Use

Don't use it for anything serious, it lacks any form of intelligence.

Bias, Risks, and Limitations

Limited to ~couple exaFLOPs of compute, don't expect anything coherent beyond a couple sentences.

Recommendations

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

50B Bytes of custom FineWeb Edu & Open Web Math mixture.

Training Hyperparameters

Training regime: bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3.

Speeds, Sizes, Times

Throughput = 350 characters/second using unoptimized inference code. Prompt processing is basically instantaneous, so generation is likely bottlenecked by bandwidth and overhead.

Evaluation

Results

Bits-per-byte: ~1 HellaSwag Accuracy: 33.4% (removed Wikihow entries)

Summary

Technical Specifications

Model Architecture and Objective

Modded RWKV 7 (see top)

Compute Infrastructure

1 x RTX 4080 for 1 week

xTimeCrystal
/

RWKV-7-25M-Base