olmostories-8m

TinyStories trained on OLMo 2 architecture.

Took around 4 hours on an A100 (80GB).

config = Olmo2Config(
    vocab_size=5000,
    hidden_size=512,
    intermediate_size=1280,
    num_hidden_layers=8,
    num_attention_heads=8, 
    num_key_value_heads=8,
    max_position_embeddings=1024,
    initializer_range=0.02,
    attention_dropout=0.1,
)
Downloads last month
9
Safetensors
Model size
29.3M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train doabell/olmostories-29m

Space using doabell/olmostories-29m 1