Multilingual Superb Performance in Hindi , English and Hinglish
Would Help to set good base for thinking models
Has native thinking in Hinglish and English.
Trained on v4-8 TPU
- Active Params : 1.7B (Including Embedding Layer)
- Specialized Tokenizer (fhai50032/QTK-81K) For better Tokenization for Hindi , English, Math & Code
- Tied Embeddings
- Torch-XLA (SPMD)
- Flash-Attention ( Block-Size = 512 )
- 6B Tokens Trained
- Training Time = 32h
- AdamW Optimizer
- Cosine Schedular
- Batch_Size = 72
- Max_Seq_Len = 2048
- Packed = True
- Min_lr = 0
- Max_lr = 3e-4
- Epoch = 2
- Final Val_loss = 1.04x
- Final Running Loss = 0.9x
- Weight Decay = 0.05
- LLama Arch
Average Training Throughput
- 42-000 tokens / second
Evals will be in Dir
Compute Provided by Google ;)
❤️ TRC ❤️Google
- Downloads last month
- 47
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for tinycompany/BiBo-Mini-1.7B-SFT-Stage-1
Base model
tinycompany/BiBo-Mini-v0.9x