Gemma-3 270M Fine-tuned on TinyStories
This is a custom implementation of Gemma-3 270M parameter model fine-tuned on the TinyStories dataset.
Model Details
- Architecture: Custom Gemma-3 with sliding window attention
- Parameters: ~270M
- Training Dataset: TinyStories
- Context Length: 32,768 tokens
- Sliding Window: 512 tokens
Usage
# Note: This model requires the custom Gemma3Model class from the training notebook
# You'll need to copy the model definition to use this model
Training Details
- Trained for 150,000 steps
- Final training loss: ~2.55
- Final validation loss: ~2.56
- Downloads last month
- 2