Text Generation
English

Model Card for Model ID

Model Details

Model Description

  • Developed by: xTimeCrystal
  • Model type: RWKV 7 (NOTE: the decay is computed using -F.softplus instead of -0.606*torch.sigmoid, all LoRAs use Tanh, LoRA weights are stored like nn.Linear)
  • Language(s) (NLP): English
  • License: MIT

Uses

Direct Use

Fast autocomplete model.

Out-of-Scope Use

Don't use it for anything serious, it lacks any form of intelligence.

Bias, Risks, and Limitations

Limited to ~couple exaFLOPs of compute, don't expect anything coherent beyond a couple sentences.

Recommendations

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

50B Bytes of custom FineWeb Edu & Open Web Math mixture.

Training Hyperparameters

  • Training regime: bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3.

Speeds, Sizes, Times

Throughput = 350 characters/second using unoptimized inference code. Prompt processing is basically instantaneous, so generation is likely bottlenecked by bandwidth and overhead.

Evaluation

Results

Bits-per-byte: ~1 HellaSwag Accuracy: 33.4% (removed Wikihow entries)

Summary

Technical Specifications

Model Architecture and Objective

Modded RWKV 7 (see top)

Compute Infrastructure

1 x RTX 4080 for 1 week

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train xTimeCrystal/RWKV-7-25M-Base