olmostories-8m
TinyStories trained on OLMo 2 architecture.
Took around 4 hours on an A100 (80GB).
config = Olmo2Config(
vocab_size=5000,
hidden_size=512,
intermediate_size=1280,
num_hidden_layers=8,
num_attention_heads=8,
num_key_value_heads=8,
max_position_embeddings=1024,
initializer_range=0.02,
attention_dropout=0.1,
)
- Downloads last month
- 9
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support