Model Details

Training Details

  • Hardware: 2 NVIDIA DGX H100 nodes on Taipei-1
  • Software: NVIDIA NeMo and NVIDIA NeMo Framework Launcher
  • Training time: 7 days
  • Context length: 8192 tokens
  • Steps: 81783
  • Global batch size: 112
  • Epochs: 3
  • Tokens trained on: 75B+
  • Learning rate:
    • Schedule: Cosine
    • Warmup steps: 107
    • Constant steps: 11873
    • Max LR: 1e-4
    • Min LR: 1e-5

Dataset Details

  • Trained mostly on Taiwan legal documents
  • Tokens: 25B
Downloads last month
4
Safetensors
Model size
7.49B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support