|
--- |
|
language: en |
|
license: mit |
|
tags: |
|
- spiking-neural-networks |
|
- language-modeling |
|
- neuromorphic |
|
- energy-efficient |
|
- biological-ai |
|
datasets: |
|
- fineweb-5B |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# π§ Spiking Neural Network Language Model - Training Checkpoint |
|
|
|
**Live training checkpoint from the world's first large-scale spiking language model!** |
|
|
|
## Current Training Status |
|
|
|
- **Training Step**: 554,000 |
|
- **Tokens Processed**: 5.67B tokens |
|
- **Current Loss**: 4.5783 |
|
- **Spike Rate**: 0.0508 |
|
- **Learning Rate**: 8.15e-05 |
|
|
|
## Model Architecture |
|
|
|
- **Parameters**: ~54M |
|
- **Architecture**: 12-layer Spiking LTC Network |
|
- **Hidden Size**: 768 |
|
- **Sequence Length**: 1024 |
|
- **Multi-timescale Processing**: Fast β Medium β Slow layers |
|
|
|
## Training Details |
|
|
|
- **Dataset**: PatrickHaller/fineweb-5B |
|
- **Target**: 3 epochs (~15B tokens total) |
|
- **Biological Dynamics**: Adaptive thresholds, refractory periods |
|
- **Energy Efficiency**: ~5% neuron activation vs 100% in Transformers |
|
|
|
## Scientific Significance |
|
|
|
This represents ongoing training of the first large-scale spiking neural network for language modeling, demonstrating: |
|
|
|
1. **Biological neural dynamics** can learn language at scale |
|
2. **Energy efficiency** through sparse neural firing |
|
3. **Multi-timescale processing** for hierarchical understanding |
|
|
|
## Usage |
|
|
|
```python |
|
# Download this checkpoint |
|
from huggingface_hub import hf_hub_download |
|
checkpoint = hf_hub_download( |
|
repo_id="rootxhacker/piking-llm-5b-3epochs-exp", |
|
filename="checkpoint_554000.pt" |
|
) |
|
|
|
# Load with custom spiking model code |
|
# (See full implementation for complete usage) |
|
``` |
|
|
|
--- |
|
|
|
**π¬ This is live research in progress! Check back for updates as training continues.** |
|
|
|
**Training Progress**: 37.8% complete towards 15B tokens |
|
|