Multilingual Superb Performance in Hindi , English and Hinglish

Would Help to set good base for thinking models

Has native thinking in Hinglish and English.

Trained on v4-8 TPU

  • Active Params : 1.4B (Including Embedding Layer)
  • Specialized Tokenizer (fhai50032/QTK-81K) For better Tokenization for Hindi , English, Math & Code
  • Tied Embeddings
  • Tied Embeddings
  • Torch-XLA (SPMD)
  • Splash-Attention ( Block-Size = 512 )
  • 6B Tokens Trained
  • Training Time = 24h
  • Lion Optimizer
  • Cosine Schedular
  • Batch_Size = 96
  • Max_Seq_Len = 2048
  • Packed = True
  • Min_lr = 0
  • Max_lr = 3e-4/3.5
  • Epoch = 2.25
  • Final Val_loss = 1.15x
  • Final Running Loss = 1.01x
  • Weight Decay = 0.05
  • Qwen2 Arch

Average Training Throughput

  • 60-000 tokens / second
  • 3.2 sec / step
  • 18.75 steps / minute

Evals will be in Dir

Compute Provided by Google ;)

❤️ TRC ❤️Google

Downloads last month
28
Safetensors
Model size
1.44B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tinycompany/Ornaments-Jalwa-SFT-1

Finetuned
(1)
this model

Dataset used to train tinycompany/Ornaments-Jalwa-SFT-1