Text Generation
Transformers
Safetensors
English
gla

This checkpoint of the 1.3B GLA model used in the paper Gated Linear Attention. The model is trained with 100B tokens from the SlimPajama dataset tokenized with Llama2 tokenizer.

See the model and loading script in this repo.

Downloads last month
0
Safetensors
Model size
1.37B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train bailin28/gla-1B-100B