Multilingual Superb Performance in Hindi , English and Hinglish

Would Help to set good base for thinking models

Has native thinking in Hinglish and English.

Trained on v4-8 TPU

  • Active Params : 1.7B (Including Embedding Layer)
  • Specialized Tokenizer (fhai50032/QTK-81K) For better Tokenization for Hindi , English, Math & Code
  • Tied Embeddings
  • Torch-XLA (SPMD)
  • Flash-Attention ( Block-Size = 512 )
  • 6B Tokens Trained
  • Training Time = 32h
  • AdamW Optimizer
  • Cosine Schedular
  • Batch_Size = 72
  • Max_Seq_Len = 2048
  • Packed = True
  • Min_lr = 0
  • Max_lr = 3e-4
  • Epoch = 2
  • Final Val_loss = 1.04x
  • Final Running Loss = 0.9x
  • Weight Decay = 0.05
  • LLama Arch

Average Training Throughput

  • 42-000 tokens / second

Evals will be in Dir

Compute Provided by Google ;)

❤️ TRC ❤️Google

Downloads last month
47
Safetensors
Model size
1.78B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tinycompany/BiBo-Mini-1.7B-SFT-Stage-1

Finetuned
(1)
this model

Dataset used to train tinycompany/BiBo-Mini-1.7B-SFT-Stage-1