This model was created by ilnikolaev
Trained from scratch using Tensorflow Keras
200mb Russian Comments from 2ch dataset used
- Type: decoder-only
- Tokenizer: BPE
- Vocabulary size: 32000
- Max sequence length: 120
- Hidden size: 768
- FFN size: 3072
- Attention heads: 24
- Decoder layers: 4