Model Card for Model ID
Model Details
Model Description
- Developed by: xTimeCrystal
- Model type: RWKV 7 (NOTE: the decay is computed using -F.softplus instead of -0.606*torch.sigmoid, all LoRAs use Tanh, LoRA weights are stored like nn.Linear)
- Language(s) (NLP): English
- License: MIT
Uses
Direct Use
Fast autocomplete model.
Out-of-Scope Use
Don't use it for anything serious, it lacks any form of intelligence.
Bias, Risks, and Limitations
Limited to ~couple exaFLOPs of compute, don't expect anything coherent beyond a couple sentences.
Recommendations
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
50B Bytes of custom FineWeb Edu & Open Web Math mixture.
Training Hyperparameters
- Training regime: bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3.
Speeds, Sizes, Times
Throughput = 350 characters/second using unoptimized inference code. Prompt processing is basically instantaneous, so generation is likely bottlenecked by bandwidth and overhead.
Evaluation
Results
Bits-per-byte: ~1 HellaSwag Accuracy: 33.4% (removed Wikihow entries)
Summary
Technical Specifications
Model Architecture and Objective
Modded RWKV 7 (see top)
Compute Infrastructure
1 x RTX 4080 for 1 week
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support