lopentu
/

Taiwan-Lawstral

Text Generation

text-generation-inference

Model card Files Files and versions

Model Details

Continually pretrained from MediaTek-Research/Breeze-7B-32k-Base-v1_0
Base model with no instruction fine-tuning

Training Details

Hardware: 2 NVIDIA DGX H100 nodes on Taipei-1
Software: NVIDIA NeMo and NVIDIA NeMo Framework Launcher
Training time: 7 days
Context length: 8192 tokens
Steps: 81783
Global batch size: 112
Epochs: 3
Tokens trained on: 75B+
Learning rate:
- Schedule: Cosine
- Warmup steps: 107
- Constant steps: 11873
- Max LR: 1e-4
- Min LR: 1e-5

Dataset Details

Trained mostly on Taiwan legal documents
Tokens: 25B

Downloads last month: 6

Safetensors

Model size

7.49B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support