Model Details
- Continually pretrained from MediaTek-Research/Breeze-7B-32k-Base-v1_0
- Base model with no instruction fine-tuning
Training Details
- Hardware: 2 NVIDIA DGX H100 nodes on Taipei-1
- Software: NVIDIA NeMo and NVIDIA NeMo Framework Launcher
- Training time: 7 days
- Context length: 8192 tokens
- Steps: 81783
- Global batch size: 112
- Epochs: 3
- Tokens trained on: 75B+
- Learning rate:
- Schedule: Cosine
- Warmup steps: 107
- Constant steps: 11873
- Max LR: 1e-4
- Min LR: 1e-5
Dataset Details
- Trained mostly on Taiwan legal documents
- Tokens: 25B
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support