alex-shvets commited on
Commit
447bf56
·
verified ·
1 Parent(s): d230e0b

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,3 +1,127 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ base_model: roberta-large
4
+ metrics:
5
+ - f1
6
+ model-index:
7
+ - name: roberta-large-emopillars-contextual
8
+ results: []
9
+ ---
10
+
11
+ # roberta-large-emopillars-contextual
12
+
13
+ This model is a fine-tuned version of [roberta-large](https://huggingface.co/roberta-large) on [EmoPillars'](https://huggingface.co/datasets/alex-shvets/EmoPillars) "[context-full](https://huggingface.co/datasets/alex-shvets/EmoPillars/tree/main/context-full)" subset.
14
+
15
+ ## Model description
16
+
17
+ The model is a multi-label classifier over 28 emotional classes for a context-aware scenario. It takes as input a context concatenated with a character description and an utterance, and extracts emotions only from the utterance.
18
+
19
+ ## How to use
20
+
21
+ Here is how to use this model:
22
+
23
+ ```python
24
+ >>> import torch
25
+ >>> from transformers import pipeline
26
+ >>> model_name = "roberta-large-emopillars-contextual"
27
+ >>> threshold = 0.5
28
+ >>> emotions = ["admiration","amusement","anger","annoyance","approval","caring","confusion","curiosity","desire","disappointment","disapproval","disgust","embarrassment","excitement","fear","gratitude","grief","joy","love","nervousness","optimism","pride","realization","relief","remorse","sadness","surprise","neutral"]
29
+ >>> label_to_emotion = dict(zip(list(range(len(emotions))), emotions))
30
+ >>> device = torch.device("cuda" if torch.cuda.is_available() else "CPU")
31
+ >>> pipe = pipeline("text-classification", model=model_name, truncation=True, return_all_scores=True, device=-1 if device.type=="cpu" else 0)
32
+ >>> utterances_in_contexts = ["A user watched a video of a musical performance on YouTube. This user expresses an opinion and thoughts. User: \"Ok is it just me or is anyone else getting goosebumps too???\"", "User: \"Sorry\", Conversational agent: \"Sorry for what??\", User: \"Don’t know what to do\""]
33
+ >>> outcome = pipe(utterances_in_contexts)
34
+ >>> dominant_classes = [[prediction for prediction in example if prediction['score']>=threshold] for example in outcome]
35
+ >>> for example in dominant_classes:
36
+ >>> print(", ".join(["%s:%.2lf" % (label_to_emotion[int(prediction['label'])], prediction['score']) for prediction in sorted(example, key = lambda x: x['score'], reverse=True)]))
37
+ surprise:0.99, amusement:0.87, curiosity:0.60, nervousness:0.58
38
+ confusion:0.97, nervousness:0.76, embarrassment:0.65
39
+ ```
40
+
41
+ ## Training data
42
+
43
+ The training data consists of 93,979 samples of [EmoPillars'](https://huggingface.co/datasets/alex-shvets/EmoPillars) "[context-full](https://huggingface.co/datasets/alex-shvets/EmoPillars/tree/main/context-full)" subset created using [Mistral](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) within [our data synthesis pipeline EmoPillars on GitHub](https://github.com/alex-shvets/emopillars). [WikiPlots](https://github.com/markriedl/WikiPlots) was used as a seed corpus.
44
+
45
+ ## Training procedure
46
+
47
+ ### Training hyperparameters
48
+
49
+ The following hyperparameters were used during training:
50
+ - learning_rate: 2e-05
51
+ - train_batch_size: 32
52
+ - eval_batch_size: 8
53
+ - seed: 752
54
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
+ - lr_scheduler_type: linear
56
+ - num_epochs: 10.0
57
+
58
+ ### Framework versions
59
+
60
+ - Transformers 4.45.0.dev0
61
+ - Pytorch 2.4.0a0+gite3b9b71
62
+ - Datasets 2.21.0
63
+ - Tokenizers 0.19.1
64
+
65
+ ## Evaluation
66
+
67
+ Scores for the evaluation on the EmoPillars' "context-full" test split:
68
+
69
+ | **class** | **precision**| **recall** | **f1-score** | **support** |
70
+ | :--- | :---: | :---: | :---: | ---: |
71
+ | admiration | 0.72 | 0.68 | 0.70 | 635 |
72
+ | amusement | 0.79 | 0.63 | 0.70 | 211 |
73
+ | anger | 0.86 | 0.82 | 0.84 | 1155 |
74
+ | annoyance | 0.80 | 0.76 | 0.78 | 865 |
75
+ | approval | 0.58 | 0.42 | 0.49 | 250 |
76
+ | caring | 0.66 | 0.60 | 0.63 | 485 |
77
+ | confusion | 0.76 | 0.78 | 0.77 | 1283 |
78
+ | curiosity | 0.83 | 0.79 | 0.81 | 780 |
79
+ | desire | 0.80 | 0.75 | 0.77 | 864 |
80
+ | disappointment | 0.79 | 0.80 | 0.80 | 1264 |
81
+ | disapproval | 0.55 | 0.47 | 0.51 | 445 |
82
+ | disgust | 0.73 | 0.60 | 0.66 | 320 |
83
+ | embarrassment | 0.65 | 0.50 | 0.57 | 116 |
84
+ | excitement | 0.74 | 0.71 | 0.73 | 685 |
85
+ | fear | 0.87 | 0.85 | 0.86 | 990 |
86
+ | gratitude | 0.79 | 0.74 | 0.76 | 155 |
87
+ | grief | 0.79 | 0.71 | 0.75 | 133 |
88
+ | joy | 0.80 | 0.78 | 0.79 | 668 |
89
+ | love | 0.70 | 0.61 | 0.65 | 254 |
90
+ | nervousness | 0.81 | 0.80 | 0.80 | 1368 |
91
+ | optimism | 0.82 | 0.76 | 0.79 | 506 |
92
+ | pride | 0.85 | 0.82 | 0.83 | 497 |
93
+ | realization | 0.74 | 0.57 | 0.64 | 120 |
94
+ | relief | 0.76 | 0.67 | 0.71 | 211 |
95
+ | remorse | 0.59 | 0.53 | 0.56 | 206 |
96
+ | sadness | 0.80 | 0.79 | 0.79 | 922 |
97
+ | surprise | 0.80 | 0.78 | 0.79 | 852 |
98
+ | neutral | 0.67 | 0.57 | 0.61 | 392 |
99
+ | **micro avg** | 0.78 | 0.74 | 0.76 | 16632 |
100
+ | **macro avg** | 0.75 | 0.69 | 0.72 | 16632 |
101
+ | **weighted avg** | 0.78 | 0.74 | 0.76 | 16632 |
102
+ | **samples avg** | 0.79 | 0.76 | 0.75 | 16632 |
103
+
104
+
105
+ When fine-tuned on downstream tasks, this model achieves the following results:
106
+
107
+ | **task** | **precision**| **recall** | **f1-score** |
108
+ | :--- | :---: | :---: | :---: |
109
+ | EmoContext (dev) | 0.81 | 0.83 | 0.82 |
110
+ | EmoContext (test) | 0.76 | 0.78 | 0.77 |
111
+
112
+ For more details on the evaluation, please visit our [GitHub repository](https://github.com/alex-shvets/emopillars).
113
+
114
+
115
+ ## Disclaimer
116
+
117
+ <details>
118
+
119
+ <summary>Click to expand</summary>
120
+
121
+ The model published in this repository is intended for a generalist purpose and is available to third parties. This model may have bias and/or any other undesirable distortions.
122
+
123
+ When third parties deploy or provide systems and/or services to other parties using this model (or using systems based on this model) or become users of the model, they should note that it is their responsibility to mitigate the risks arising from its use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.
124
+
125
+ In no event shall the creator of the model be liable for any results arising from the use made by third parties of this model.
126
+
127
+ </details>
config.json ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "roberta-large",
3
+ "architectures": [
4
+ "RobertaForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "finetuning_task": "text-classification",
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 1024,
14
+ "id2label": {
15
+ "0": "0",
16
+ "1": "1",
17
+ "2": "10",
18
+ "3": "11",
19
+ "4": "12",
20
+ "5": "13",
21
+ "6": "14",
22
+ "7": "15",
23
+ "8": "16",
24
+ "9": "17",
25
+ "10": "18",
26
+ "11": "19",
27
+ "12": "2",
28
+ "13": "20",
29
+ "14": "21",
30
+ "15": "22",
31
+ "16": "23",
32
+ "17": "24",
33
+ "18": "25",
34
+ "19": "26",
35
+ "20": "27",
36
+ "21": "3",
37
+ "22": "4",
38
+ "23": "5",
39
+ "24": "6",
40
+ "25": "7",
41
+ "26": "8",
42
+ "27": "9"
43
+ },
44
+ "initializer_range": 0.02,
45
+ "intermediate_size": 4096,
46
+ "label2id": {
47
+ "0": 0,
48
+ "1": 1,
49
+ "10": 2,
50
+ "11": 3,
51
+ "12": 4,
52
+ "13": 5,
53
+ "14": 6,
54
+ "15": 7,
55
+ "16": 8,
56
+ "17": 9,
57
+ "18": 10,
58
+ "19": 11,
59
+ "2": 12,
60
+ "20": 13,
61
+ "21": 14,
62
+ "22": 15,
63
+ "23": 16,
64
+ "24": 17,
65
+ "25": 18,
66
+ "26": 19,
67
+ "27": 20,
68
+ "3": 21,
69
+ "4": 22,
70
+ "5": 23,
71
+ "6": 24,
72
+ "7": 25,
73
+ "8": 26,
74
+ "9": 27
75
+ },
76
+ "layer_norm_eps": 1e-05,
77
+ "max_position_embeddings": 514,
78
+ "model_type": "roberta",
79
+ "num_attention_heads": 16,
80
+ "num_hidden_layers": 24,
81
+ "pad_token_id": 1,
82
+ "position_embedding_type": "absolute",
83
+ "problem_type": "multi_label_classification",
84
+ "torch_dtype": "float32",
85
+ "transformers_version": "4.45.0.dev0",
86
+ "type_vocab_size": 1,
87
+ "use_cache": true,
88
+ "vocab_size": 50265
89
+ }
labels.txt ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ admiration
2
+ amusement
3
+ anger
4
+ annoyance
5
+ approval
6
+ caring
7
+ confusion
8
+ curiosity
9
+ desire
10
+ disappointment
11
+ disapproval
12
+ disgust
13
+ embarrassment
14
+ excitement
15
+ fear
16
+ gratitude
17
+ grief
18
+ joy
19
+ love
20
+ nervousness
21
+ optimism
22
+ pride
23
+ realization
24
+ relief
25
+ remorse
26
+ sadness
27
+ surprise
28
+ neutral
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ebf5e4a812fff6fc55f9cab0a871462b4d7df9a42d08d3cfe0f7370eb92e593f
3
+ size 1421602016
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": true,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "mask_token": "<mask>",
51
+ "model_max_length": 512,
52
+ "pad_token": "<pad>",
53
+ "sep_token": "</s>",
54
+ "tokenizer_class": "RobertaTokenizer",
55
+ "trim_offsets": true,
56
+ "truncation_side": "left",
57
+ "unk_token": "<unk>"
58
+ }
trainer_state.json ADDED
@@ -0,0 +1,448 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 10.0,
5
+ "eval_steps": 500,
6
+ "global_step": 29370,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.17024174327545114,
13
+ "grad_norm": 0.6650531888008118,
14
+ "learning_rate": 1.96595165134491e-05,
15
+ "loss": 0.2305,
16
+ "step": 500
17
+ },
18
+ {
19
+ "epoch": 0.34048348655090227,
20
+ "grad_norm": 2.3951523303985596,
21
+ "learning_rate": 1.9319033026898198e-05,
22
+ "loss": 0.1633,
23
+ "step": 1000
24
+ },
25
+ {
26
+ "epoch": 0.5107252298263534,
27
+ "grad_norm": 0.9463224411010742,
28
+ "learning_rate": 1.8978549540347296e-05,
29
+ "loss": 0.1548,
30
+ "step": 1500
31
+ },
32
+ {
33
+ "epoch": 0.6809669731018045,
34
+ "grad_norm": 1.2913334369659424,
35
+ "learning_rate": 1.8638066053796395e-05,
36
+ "loss": 0.1487,
37
+ "step": 2000
38
+ },
39
+ {
40
+ "epoch": 0.8512087163772557,
41
+ "grad_norm": 1.0490585565567017,
42
+ "learning_rate": 1.829758256724549e-05,
43
+ "loss": 0.1435,
44
+ "step": 2500
45
+ },
46
+ {
47
+ "epoch": 1.0214504596527068,
48
+ "grad_norm": 1.5652408599853516,
49
+ "learning_rate": 1.7957099080694588e-05,
50
+ "loss": 0.1397,
51
+ "step": 3000
52
+ },
53
+ {
54
+ "epoch": 1.191692202928158,
55
+ "grad_norm": 1.1205641031265259,
56
+ "learning_rate": 1.7616615594143686e-05,
57
+ "loss": 0.1283,
58
+ "step": 3500
59
+ },
60
+ {
61
+ "epoch": 1.361933946203609,
62
+ "grad_norm": 1.0744216442108154,
63
+ "learning_rate": 1.727613210759278e-05,
64
+ "loss": 0.1292,
65
+ "step": 4000
66
+ },
67
+ {
68
+ "epoch": 1.5321756894790601,
69
+ "grad_norm": 1.089113712310791,
70
+ "learning_rate": 1.693564862104188e-05,
71
+ "loss": 0.1273,
72
+ "step": 4500
73
+ },
74
+ {
75
+ "epoch": 1.7024174327545114,
76
+ "grad_norm": 2.334705114364624,
77
+ "learning_rate": 1.6595165134490977e-05,
78
+ "loss": 0.1275,
79
+ "step": 5000
80
+ },
81
+ {
82
+ "epoch": 1.8726591760299627,
83
+ "grad_norm": 1.1323754787445068,
84
+ "learning_rate": 1.6254681647940076e-05,
85
+ "loss": 0.1251,
86
+ "step": 5500
87
+ },
88
+ {
89
+ "epoch": 2.0429009193054135,
90
+ "grad_norm": 0.8757261633872986,
91
+ "learning_rate": 1.5914198161389174e-05,
92
+ "loss": 0.1213,
93
+ "step": 6000
94
+ },
95
+ {
96
+ "epoch": 2.213142662580865,
97
+ "grad_norm": 1.1232839822769165,
98
+ "learning_rate": 1.5573714674838272e-05,
99
+ "loss": 0.1104,
100
+ "step": 6500
101
+ },
102
+ {
103
+ "epoch": 2.383384405856316,
104
+ "grad_norm": 0.8715490698814392,
105
+ "learning_rate": 1.5233231188287369e-05,
106
+ "loss": 0.1099,
107
+ "step": 7000
108
+ },
109
+ {
110
+ "epoch": 2.553626149131767,
111
+ "grad_norm": 1.2656769752502441,
112
+ "learning_rate": 1.4892747701736467e-05,
113
+ "loss": 0.1102,
114
+ "step": 7500
115
+ },
116
+ {
117
+ "epoch": 2.723867892407218,
118
+ "grad_norm": 1.1669204235076904,
119
+ "learning_rate": 1.4552264215185565e-05,
120
+ "loss": 0.1101,
121
+ "step": 8000
122
+ },
123
+ {
124
+ "epoch": 2.8941096356826694,
125
+ "grad_norm": 1.0073705911636353,
126
+ "learning_rate": 1.4211780728634664e-05,
127
+ "loss": 0.1085,
128
+ "step": 8500
129
+ },
130
+ {
131
+ "epoch": 3.0643513789581207,
132
+ "grad_norm": 1.1393821239471436,
133
+ "learning_rate": 1.3871297242083762e-05,
134
+ "loss": 0.1027,
135
+ "step": 9000
136
+ },
137
+ {
138
+ "epoch": 3.2345931222335715,
139
+ "grad_norm": 1.4679887294769287,
140
+ "learning_rate": 1.3530813755532857e-05,
141
+ "loss": 0.0926,
142
+ "step": 9500
143
+ },
144
+ {
145
+ "epoch": 3.404834865509023,
146
+ "grad_norm": 0.8374710083007812,
147
+ "learning_rate": 1.3190330268981955e-05,
148
+ "loss": 0.0925,
149
+ "step": 10000
150
+ },
151
+ {
152
+ "epoch": 3.575076608784474,
153
+ "grad_norm": 1.2514032125473022,
154
+ "learning_rate": 1.2849846782431053e-05,
155
+ "loss": 0.0927,
156
+ "step": 10500
157
+ },
158
+ {
159
+ "epoch": 3.7453183520599254,
160
+ "grad_norm": 1.5251351594924927,
161
+ "learning_rate": 1.250936329588015e-05,
162
+ "loss": 0.0929,
163
+ "step": 11000
164
+ },
165
+ {
166
+ "epoch": 3.915560095335376,
167
+ "grad_norm": 1.0668872594833374,
168
+ "learning_rate": 1.2168879809329248e-05,
169
+ "loss": 0.0923,
170
+ "step": 11500
171
+ },
172
+ {
173
+ "epoch": 4.085801838610827,
174
+ "grad_norm": 1.0528796911239624,
175
+ "learning_rate": 1.1828396322778346e-05,
176
+ "loss": 0.0848,
177
+ "step": 12000
178
+ },
179
+ {
180
+ "epoch": 4.256043581886279,
181
+ "grad_norm": 1.316041111946106,
182
+ "learning_rate": 1.1487912836227445e-05,
183
+ "loss": 0.0767,
184
+ "step": 12500
185
+ },
186
+ {
187
+ "epoch": 4.42628532516173,
188
+ "grad_norm": 1.6180927753448486,
189
+ "learning_rate": 1.1147429349676541e-05,
190
+ "loss": 0.077,
191
+ "step": 13000
192
+ },
193
+ {
194
+ "epoch": 4.596527068437181,
195
+ "grad_norm": 1.2156362533569336,
196
+ "learning_rate": 1.080694586312564e-05,
197
+ "loss": 0.0773,
198
+ "step": 13500
199
+ },
200
+ {
201
+ "epoch": 4.766768811712632,
202
+ "grad_norm": 1.621887445449829,
203
+ "learning_rate": 1.0466462376574738e-05,
204
+ "loss": 0.0773,
205
+ "step": 14000
206
+ },
207
+ {
208
+ "epoch": 4.937010554988083,
209
+ "grad_norm": 1.5306437015533447,
210
+ "learning_rate": 1.0125978890023836e-05,
211
+ "loss": 0.0774,
212
+ "step": 14500
213
+ },
214
+ {
215
+ "epoch": 5.107252298263534,
216
+ "grad_norm": 22.37914276123047,
217
+ "learning_rate": 9.785495403472932e-06,
218
+ "loss": 0.0678,
219
+ "step": 15000
220
+ },
221
+ {
222
+ "epoch": 5.2774940415389855,
223
+ "grad_norm": 1.3330860137939453,
224
+ "learning_rate": 9.44501191692203e-06,
225
+ "loss": 0.0634,
226
+ "step": 15500
227
+ },
228
+ {
229
+ "epoch": 5.447735784814436,
230
+ "grad_norm": 1.9692567586898804,
231
+ "learning_rate": 9.104528430371127e-06,
232
+ "loss": 0.0634,
233
+ "step": 16000
234
+ },
235
+ {
236
+ "epoch": 5.617977528089888,
237
+ "grad_norm": 1.3089221715927124,
238
+ "learning_rate": 8.764044943820226e-06,
239
+ "loss": 0.0635,
240
+ "step": 16500
241
+ },
242
+ {
243
+ "epoch": 5.788219271365339,
244
+ "grad_norm": 1.5806821584701538,
245
+ "learning_rate": 8.423561457269324e-06,
246
+ "loss": 0.0637,
247
+ "step": 17000
248
+ },
249
+ {
250
+ "epoch": 5.95846101464079,
251
+ "grad_norm": 1.579941987991333,
252
+ "learning_rate": 8.08307797071842e-06,
253
+ "loss": 0.0633,
254
+ "step": 17500
255
+ },
256
+ {
257
+ "epoch": 6.128702757916241,
258
+ "grad_norm": 1.5726784467697144,
259
+ "learning_rate": 7.742594484167519e-06,
260
+ "loss": 0.054,
261
+ "step": 18000
262
+ },
263
+ {
264
+ "epoch": 6.298944501191692,
265
+ "grad_norm": 1.140791654586792,
266
+ "learning_rate": 7.402110997616616e-06,
267
+ "loss": 0.052,
268
+ "step": 18500
269
+ },
270
+ {
271
+ "epoch": 6.469186244467143,
272
+ "grad_norm": 1.6548409461975098,
273
+ "learning_rate": 7.061627511065714e-06,
274
+ "loss": 0.0516,
275
+ "step": 19000
276
+ },
277
+ {
278
+ "epoch": 6.639427987742595,
279
+ "grad_norm": 1.3514069318771362,
280
+ "learning_rate": 6.721144024514812e-06,
281
+ "loss": 0.0522,
282
+ "step": 19500
283
+ },
284
+ {
285
+ "epoch": 6.809669731018046,
286
+ "grad_norm": 1.5590009689331055,
287
+ "learning_rate": 6.38066053796391e-06,
288
+ "loss": 0.0518,
289
+ "step": 20000
290
+ },
291
+ {
292
+ "epoch": 6.9799114742934965,
293
+ "grad_norm": 1.2986799478530884,
294
+ "learning_rate": 6.0401770514130066e-06,
295
+ "loss": 0.0524,
296
+ "step": 20500
297
+ },
298
+ {
299
+ "epoch": 7.150153217568948,
300
+ "grad_norm": 1.5317639112472534,
301
+ "learning_rate": 5.699693564862104e-06,
302
+ "loss": 0.044,
303
+ "step": 21000
304
+ },
305
+ {
306
+ "epoch": 7.320394960844399,
307
+ "grad_norm": 2.344708204269409,
308
+ "learning_rate": 5.359210078311202e-06,
309
+ "loss": 0.0415,
310
+ "step": 21500
311
+ },
312
+ {
313
+ "epoch": 7.49063670411985,
314
+ "grad_norm": 3.3057548999786377,
315
+ "learning_rate": 5.0187265917603005e-06,
316
+ "loss": 0.0418,
317
+ "step": 22000
318
+ },
319
+ {
320
+ "epoch": 7.6608784473953015,
321
+ "grad_norm": 1.3382242918014526,
322
+ "learning_rate": 4.678243105209398e-06,
323
+ "loss": 0.0419,
324
+ "step": 22500
325
+ },
326
+ {
327
+ "epoch": 7.831120190670752,
328
+ "grad_norm": 1.7018738985061646,
329
+ "learning_rate": 4.337759618658495e-06,
330
+ "loss": 0.0421,
331
+ "step": 23000
332
+ },
333
+ {
334
+ "epoch": 8.001361933946203,
335
+ "grad_norm": 0.9316732883453369,
336
+ "learning_rate": 3.997276132107593e-06,
337
+ "loss": 0.0414,
338
+ "step": 23500
339
+ },
340
+ {
341
+ "epoch": 8.171603677221654,
342
+ "grad_norm": 1.4249956607818604,
343
+ "learning_rate": 3.656792645556691e-06,
344
+ "loss": 0.0346,
345
+ "step": 24000
346
+ },
347
+ {
348
+ "epoch": 8.341845420497107,
349
+ "grad_norm": 1.263279914855957,
350
+ "learning_rate": 3.3163091590057884e-06,
351
+ "loss": 0.0345,
352
+ "step": 24500
353
+ },
354
+ {
355
+ "epoch": 8.512087163772557,
356
+ "grad_norm": 2.6162939071655273,
357
+ "learning_rate": 2.9758256724548862e-06,
358
+ "loss": 0.0342,
359
+ "step": 25000
360
+ },
361
+ {
362
+ "epoch": 8.682328907048008,
363
+ "grad_norm": 1.2574002742767334,
364
+ "learning_rate": 2.635342185903984e-06,
365
+ "loss": 0.0345,
366
+ "step": 25500
367
+ },
368
+ {
369
+ "epoch": 8.85257065032346,
370
+ "grad_norm": 5.4230732917785645,
371
+ "learning_rate": 2.2948586993530815e-06,
372
+ "loss": 0.0344,
373
+ "step": 26000
374
+ },
375
+ {
376
+ "epoch": 9.02281239359891,
377
+ "grad_norm": 0.885810136795044,
378
+ "learning_rate": 1.9543752128021793e-06,
379
+ "loss": 0.0333,
380
+ "step": 26500
381
+ },
382
+ {
383
+ "epoch": 9.19305413687436,
384
+ "grad_norm": 1.7516717910766602,
385
+ "learning_rate": 1.6138917262512767e-06,
386
+ "loss": 0.0291,
387
+ "step": 27000
388
+ },
389
+ {
390
+ "epoch": 9.363295880149813,
391
+ "grad_norm": 1.1372159719467163,
392
+ "learning_rate": 1.2734082397003748e-06,
393
+ "loss": 0.0293,
394
+ "step": 27500
395
+ },
396
+ {
397
+ "epoch": 9.533537623425264,
398
+ "grad_norm": 0.9269993305206299,
399
+ "learning_rate": 9.329247531494723e-07,
400
+ "loss": 0.0294,
401
+ "step": 28000
402
+ },
403
+ {
404
+ "epoch": 9.703779366700715,
405
+ "grad_norm": 1.229074239730835,
406
+ "learning_rate": 5.9244126659857e-07,
407
+ "loss": 0.0291,
408
+ "step": 28500
409
+ },
410
+ {
411
+ "epoch": 9.874021109976166,
412
+ "grad_norm": 2.4099299907684326,
413
+ "learning_rate": 2.519577800476677e-07,
414
+ "loss": 0.0289,
415
+ "step": 29000
416
+ },
417
+ {
418
+ "epoch": 10.0,
419
+ "step": 29370,
420
+ "total_flos": 8.758967154215731e+17,
421
+ "train_loss": 0.07915824020562384,
422
+ "train_runtime": 29556.1772,
423
+ "train_samples_per_second": 31.797,
424
+ "train_steps_per_second": 0.994
425
+ }
426
+ ],
427
+ "logging_steps": 500,
428
+ "max_steps": 29370,
429
+ "num_input_tokens_seen": 0,
430
+ "num_train_epochs": 10,
431
+ "save_steps": 500,
432
+ "stateful_callbacks": {
433
+ "TrainerControl": {
434
+ "args": {
435
+ "should_epoch_stop": false,
436
+ "should_evaluate": false,
437
+ "should_log": false,
438
+ "should_save": true,
439
+ "should_training_stop": true
440
+ },
441
+ "attributes": {}
442
+ }
443
+ },
444
+ "total_flos": 8.758967154215731e+17,
445
+ "train_batch_size": 32,
446
+ "trial_name": null,
447
+ "trial_params": null
448
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b58a10adc1cb2dee4ef84c9eec0b4e9afaa6669bfd699fe5f1421de10fcacef
3
+ size 5240
vocab.json ADDED
The diff for this file is too large to render. See raw diff