File size: 6,244 Bytes
d3186b0
 
c98613b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d3186b0
c98613b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e9d9da3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c98613b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---

license: mit
pipeline_tag: text-generation
library_name: transformers
language: [
    'en', 'am', 'ar', 'as', 'az', 'be', 'bg', 'bn', 'br', 'bs', 'ca', 'cs', 'cy', 'da', 'de', 'el',
    'eo', 'es', 'et', 'eu', 'fa', 'ff', 'fi', 'fr', 'fy', 'ga', 'gd', 'gl', 'gn', 'gu', 'ha', 'he',
    'hi', 'hr', 'ht', 'hu', 'hy', 'id', 'ig', 'is', 'it', 'ja', 'jv', 'ka', 'kk', 'km', 'kn', 'ko',
    'ku', 'ky', 'la', 'lg', 'li', 'ln', 'lo', 'lt', 'lv', 'mg', 'mk', 'ml', 'mn', 'mr', 'ms', 'my',
    'ne', 'nl', 'no', 'ns', 'om', 'or', 'pa', 'pl', 'ps', 'pt', 'qu', 'rm', 'ro', 'ru', 'sa', 'si',
    'sc', 'sd', 'sk', 'sl', 'so', 'sq', 'sr', 'ss', 'su', 'sv', 'sw', 'ta', 'te', 'th', 'tl', 'tn',
    'tr', 'ug', 'uk', 'ur', 'uz', 'vi', 'wo', 'xh', 'yi', 'yo', 'zu',
]
datasets:
# core - base
- ontocord/fineweb-permissive-multilingual-2m
- distily/c4_multilingual_1M
- data-silence/sumnews
- xu-song/cc100-samples
- badrex/llm-emoji-dataset
- fblgit/simple-math
- Gusarich/math-expressions-1m
- neuralwork/arxiver
- christopher/rosetta-code
- nampdn-ai/tiny-codes
- JeanKaddour/minipile
# core - instruct
- NousResearch/hermes-function-calling-v1
- simplescaling/s1K-1.1
# base - instruct
- mlabonne/open-perfectblend
- allenai/tulu-3-sft-mixture
- rombodawg/Everything_Instruct_Multilingual
# base - reason
- open-r1/OpenR1-Math-220k
- open-thoughts/OpenThoughts-114k
- cognitivecomputations/dolphin-r1
- simplescaling/s1K-1.1
tags:
- chat
- core
- base
- instruct
- reason
---


# tangled-alpha-0.10-core

![logo](./misc/logo.jpg)

```bash

time python -B prepare_core_datasets.py

```

```

i=0, min_len=0, max_len=1073741824, block_size=1025, chunk_size=16400000, len(dataset)=5146620, len(dataset) * block_size=5275285500

Total number of tokens in the optimized dataset '../core-data-0-0-1073741824-1025-16000' is 5275285500



i=1, min_len=1025, max_len=2049, block_size=2049, chunk_size=16392000, len(dataset)=309838, len(dataset) * block_size=634858062

Total number of tokens in the optimized dataset '../core-data-1-1025-2049-2049-8000' is 634858062



i=2, min_len=2049, max_len=4097, block_size=4097, chunk_size=16388000, len(dataset)=113843, len(dataset) * block_size=466414771

Total number of tokens in the optimized dataset '../core-data-2-2049-4097-4097-4000' is 466414771



i=3, min_len=4097, max_len=8193, block_size=8193, chunk_size=16386000, len(dataset)=56713, len(dataset) * block_size=464649609

Total number of tokens in the optimized dataset '../core-data-3-4097-8193-8193-2000' is 464649609



i=4, min_len=8193, max_len=16385, block_size=16385, chunk_size=16385000, len(dataset)=37406, len(dataset) * block_size=612897310

Total number of tokens in the optimized dataset '../core-data-4-8193-16385-16385-1000' is 612897310



i=5, min_len=16385, max_len=32769, block_size=32769, chunk_size=16384500, len(dataset)=12737, len(dataset) * block_size=417378753

Total number of tokens in the optimized dataset '../core-data-5-16385-32769-32769-500' is 417378753



i=6, min_len=32769, max_len=65537, block_size=65537, chunk_size=16384250, len(dataset)=2824, len(dataset) * block_size=185076488

Total number of tokens in the optimized dataset '../core-data-6-32769-65537-65537-250' is 185076488



i=7, min_len=65537, max_len=131073, block_size=131073, chunk_size=16384125, len(dataset)=634, len(dataset) * block_size=83100282

Total number of tokens in the optimized dataset '../core-data-7-65537-131073-131073-125' is 83100282



real    292m54.341s

user    2118m1.154s

sys     12m2.746s



20G     tangled-alpha-0.9-core/core-data-0-0-1073741824-1025-16000

2.4G    tangled-alpha-0.9-core/core-data-1-1025-2049-2049-8000

1.8G    tangled-alpha-0.9-core/core-data-2-2049-4097-4097-4000

1.8G    tangled-alpha-0.9-core/core-data-3-4097-8193-8193-2000

2.3G    tangled-alpha-0.9-core/core-data-4-8193-16385-16385-1000

1.6G    tangled-alpha-0.9-core/core-data-5-16385-32769-32769-500

709M    tangled-alpha-0.9-core/core-data-6-32769-65537-65537-250

321M    tangled-alpha-0.9-core/core-data-7-65537-131073-131073-125

```

```bash

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt pretrain --config pretrain_core_model_0.yaml

```

```

Seed set to 23

Time to instantiate model: 0.21 seconds.

Total parameters: 302,023,168

Verifying settings ...

Measured TFLOPs: 55520.94

Epoch 1 | iter 64 step 1 | loss train: 11.982, val: n/a | iter time: 409.55 ms (step) remaining time: 4 days, 17:45:21

Epoch 1 | iter 128 step 2 | loss train: 11.980, val: n/a | iter time: 354.46 ms (step) remaining time: 3 days, 15:01:16

Epoch 1 | iter 192 step 3 | loss train: 11.980, val: n/a | iter time: 353.67 ms (step) remaining time: 3 days, 5:46:03

Epoch 1 | iter 256 step 4 | loss train: 11.980, val: n/a | iter time: 354.11 ms (step) remaining time: 3 days, 1:05:26

Epoch 1 | iter 320 step 5 | loss train: 11.978, val: n/a | iter time: 358.28 ms (step) remaining time: 2 days, 22:21:45

Epoch 1 | iter 384 step 6 | loss train: 11.974, val: n/a | iter time: 356.21 ms (step) remaining time: 2 days, 20:33:55

Epoch 1 | iter 448 step 7 | loss train: 11.964, val: n/a | iter time: 357.42 ms (step) remaining time: 2 days, 19:15:59

Epoch 1 | iter 512 step 8 | loss train: 11.956, val: n/a | iter time: 355.74 ms (step) remaining time: 2 days, 18:16:43

Epoch 1 | iter 576 step 9 | loss train: 11.937, val: n/a | iter time: 356.05 ms (step) remaining time: 2 days, 17:28:34

Epoch 1 | iter 640 step 10 | loss train: 11.929, val: n/a | iter time: 356.68 ms (step) remaining time: 2 days, 16:49:58

# ...

```

Backup `wandb`:

```bash

mv wandb wandb-pretrain-core-0

```

Copy config:

```bash

cp ../config-0.json ../out/pretrain-core-0/final/config.json

```

Chat with model:

```bash

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True litgpt chat ../out/pretrain-core-0/final

```

```bash

CUDA_VISIBLE_DEVICES=0 CUDA_LAUNCH_BLOCKING=0 PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True time litgpt evaluate --tasks 'leaderboard' --out_dir '../evaluate/pretrain-core-0/leaderboard/' --batch_size '4' --dtype 'bfloat16' '../out/pretrain-core-0/final'

```

```

```