Update README.md
Browse files
README.md
CHANGED
|
@@ -88,6 +88,27 @@ To use tensorboard to visualize the training loss curve:
|
|
| 88 |
pip install future tensorboard
|
| 89 |
```
|
| 90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
## Data prepration
|
| 92 |
1. File structure
|
| 93 |
|
|
|
|
| 88 |
pip install future tensorboard
|
| 89 |
```
|
| 90 |
|
| 91 |
+
5. If your training process is killed during data preprocessing, you can modify the `map_num_proc` in xtuner/xtuner/dataset
|
| 92 |
+
/huggingface.py
|
| 93 |
+
```
|
| 94 |
+
def process(dataset,
|
| 95 |
+
do_dataset_tokenization=True,
|
| 96 |
+
tokenizer=None,
|
| 97 |
+
max_length=None,
|
| 98 |
+
dataset_map_fn=None,
|
| 99 |
+
template_map_fn=None,
|
| 100 |
+
max_dataset_length=None,
|
| 101 |
+
split='train',
|
| 102 |
+
remove_unused_columns=False,
|
| 103 |
+
rename_maps=[],
|
| 104 |
+
shuffle_before_pack=True,
|
| 105 |
+
pack_to_max_length=True,
|
| 106 |
+
use_varlen_attn=False,
|
| 107 |
+
input_ids_with_output=True,
|
| 108 |
+
with_image_token=False,
|
| 109 |
+
map_num_proc=32): # modify it to 1
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
## Data prepration
|
| 113 |
1. File structure
|
| 114 |
|