Gemma-3 270M Fine-tuned on TinyStories

Architecture: Custom Gemma-3 with sliding window attention
Parameters: ~270M
Training Dataset: TinyStories
Context Length: 32,768 tokens
Sliding Window: 512 tokens

This is a custom implementation of Gemma-3 270M parameter model fine-tuned on the TinyStories dataset.

Model Details

# Note: This model requires the custom Gemma3Model class from the training notebook
# You'll need to copy the model definition to use this model