We continual pre-train meta-llama/Llama-2-7b-hf on monolingual WURA corpus for 20 languages. All languages are uniformly sampled.
Important Parameters
- num_gpus: 8
- max_steps: 8000 # see here
- gradient_accumulation_steps: 16
- per_device_batch_size: 2
- learning_rate: 2e-5
- Downloads last month
- 47
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support