Multilingual Superb Performance in Hindi , English and Hinglish
Would Help to set good base for thinking models
Has native thinking in Hinglish and English.
Trained on v4-8 TPU
- Active Params : 1.4B (Including Embedding Layer)
- Specialized Tokenizer (fhai50032/QTK-81K) For better Tokenization for Hindi , English, Math & Code
- Tied Embeddings
- Tied Embeddings
- Torch-XLA (SPMD)
- Splash-Attention ( Block-Size = 512 )
- 6B Tokens Trained
- Training Time = 24h
- Lion Optimizer
- Cosine Schedular
- Batch_Size = 96
- Max_Seq_Len = 2048
- Packed = True
- Min_lr = 0
- Max_lr = 3e-4/3.5
- Epoch = 2.25
- Final Val_loss = 1.15x
- Final Running Loss = 1.01x
- Weight Decay = 0.05
- Qwen2 Arch
Average Training Throughput
- 60-000 tokens / second
- 3.2 sec / step
- 18.75 steps / minute
Evals will be in Dir
Compute Provided by Google ;)
❤️ TRC ❤️Google
- Downloads last month
- 28
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for tinycompany/Ornaments-Jalwa-SFT-1
Base model
Ornaments/Jalwa-latest-run-pretrain-10k