Update README.md
Browse files
README.md
CHANGED
@@ -29,11 +29,10 @@ More information needed
|
|
29 |
|
30 |
## Training procedure
|
31 |
|
32 |
-
The
|
33 |
-
|
34 |
```bash
|
35 |
-
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch
|
36 |
-
--output_dir="./training_full" \
|
37 |
--model_type="gpt2" \
|
38 |
--config_name="./training" \
|
39 |
--tokenizer_name="./training" \
|
@@ -47,9 +46,8 @@ CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 r
|
|
47 |
--num_train_epochs="1" \
|
48 |
--logging_steps="500" \
|
49 |
--save_steps="5000" --preprocessing_num_workers="16" \
|
50 |
-
--gradient_accumulation_steps="4" \
|
51 |
-
--
|
52 |
-
--logging_dir="./log_full"
|
53 |
```
|
54 |
|
55 |
### Training hyperparameters
|
|
|
29 |
|
30 |
## Training procedure
|
31 |
|
32 |
+
The [`run_clm.py` script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py) from the transformers library was used. Training was distributed on two NVIDIA Quadro RTX 6000 GPUs:
|
|
|
33 |
```bash
|
34 |
+
TORCH_CPP_LOG_LEVEL=INFO NCCL_DEBUG=INFO CUDA_VISIBLE_DEVICES=0,1 nohup python -m torch.distributed.launch \
|
35 |
+
--nproc_per_node=2 run_clm.py --output_dir="./training_full" \
|
36 |
--model_type="gpt2" \
|
37 |
--config_name="./training" \
|
38 |
--tokenizer_name="./training" \
|
|
|
46 |
--num_train_epochs="1" \
|
47 |
--logging_steps="500" \
|
48 |
--save_steps="5000" --preprocessing_num_workers="16" \
|
49 |
+
--gradient_accumulation_steps="4" --report_to="tensorboard" \
|
50 |
+
--logging_dir="./log_full" > command_full_log.log 2>&1 &
|
|
|
51 |
```
|
52 |
|
53 |
### Training hyperparameters
|