Gemma-3 270M Fine-tuned on TinyStories

This is a custom implementation of Gemma-3 270M parameter model fine-tuned on the TinyStories dataset.

Model Details

  • Architecture: Custom Gemma-3 with sliding window attention
  • Parameters: ~270M
  • Training Dataset: TinyStories
  • Context Length: 32,768 tokens
  • Sliding Window: 512 tokens

Usage

# Note: This model requires the custom Gemma3Model class from the training notebook
# You'll need to copy the model definition to use this model

Training Details

  • Trained for 150,000 steps
  • Final training loss: ~2.55
  • Final validation loss: ~2.56
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train chinmaydk99/gemma3-270m-tinystories