We continual pre-train meta-llama/Llama-2-7b-hf on monolingual WURA corpus for 20 languages. All languages are uniformly sampled.

Important Parameters

  • num_gpus: 8
  • max_steps: 8000 # see here
  • gradient_accumulation_steps: 16
  • per_device_batch_size: 2
  • learning_rate: 2e-5
Downloads last month
47
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for llama-lang-adapt/pretrain-wura

Adapters
1 model

Dataset used to train llama-lang-adapt/pretrain-wura