huynguyen251 commited on
Commit
b35b71a
·
verified ·
1 Parent(s): 00f0258

Fine-tuned PhoBERT for Vietnamese Legal QA - Updated Dataset

Browse files
Files changed (47) hide show
  1. README.md +114 -0
  2. added_tokens.json +3 -0
  3. bpe.codes +0 -0
  4. checkpoint-1800/added_tokens.json +3 -0
  5. checkpoint-1800/bpe.codes +0 -0
  6. checkpoint-1800/config.json +29 -0
  7. checkpoint-1800/model.safetensors +3 -0
  8. checkpoint-1800/optimizer.pt +3 -0
  9. checkpoint-1800/rng_state.pth +3 -0
  10. checkpoint-1800/scheduler.pt +3 -0
  11. checkpoint-1800/special_tokens_map.json +9 -0
  12. checkpoint-1800/tokenizer_config.json +54 -0
  13. checkpoint-1800/trainer_state.json +420 -0
  14. checkpoint-1800/training_args.bin +3 -0
  15. checkpoint-1800/vocab.txt +0 -0
  16. checkpoint-2200/added_tokens.json +3 -0
  17. checkpoint-2200/bpe.codes +0 -0
  18. checkpoint-2200/config.json +29 -0
  19. checkpoint-2200/model.safetensors +3 -0
  20. checkpoint-2200/optimizer.pt +3 -0
  21. checkpoint-2200/rng_state.pth +3 -0
  22. checkpoint-2200/scheduler.pt +3 -0
  23. checkpoint-2200/special_tokens_map.json +9 -0
  24. checkpoint-2200/tokenizer_config.json +54 -0
  25. checkpoint-2200/trainer_state.json +504 -0
  26. checkpoint-2200/training_args.bin +3 -0
  27. checkpoint-2200/vocab.txt +0 -0
  28. checkpoint-2400/added_tokens.json +3 -0
  29. checkpoint-2400/bpe.codes +0 -0
  30. checkpoint-2400/config.json +29 -0
  31. checkpoint-2400/model.safetensors +3 -0
  32. checkpoint-2400/optimizer.pt +3 -0
  33. checkpoint-2400/rng_state.pth +3 -0
  34. checkpoint-2400/scheduler.pt +3 -0
  35. checkpoint-2400/special_tokens_map.json +9 -0
  36. checkpoint-2400/tokenizer_config.json +54 -0
  37. checkpoint-2400/trainer_state.json +546 -0
  38. checkpoint-2400/training_args.bin +3 -0
  39. checkpoint-2400/vocab.txt +0 -0
  40. config.json +29 -0
  41. eval_metrics.json +13 -0
  42. model.safetensors +3 -0
  43. special_tokens_map.json +9 -0
  44. tokenizer_config.json +54 -0
  45. training_args.bin +3 -0
  46. training_info.json +49 -0
  47. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: vi
3
+ tags:
4
+ - phobert
5
+ - question-answering
6
+ - vietnamese
7
+ - legal-qa
8
+ - pytorch
9
+ - transformers
10
+ license: apache-2.0
11
+ datasets:
12
+ - custom-legal-qa
13
+ metrics:
14
+ - f1
15
+ - accuracy
16
+ model-index:
17
+ - name: phobert-legal-qa-v2
18
+ results:
19
+ - task:
20
+ type: question-answering
21
+ name: Question Answering
22
+ metrics:
23
+ - type: f1
24
+ value: 0.602910749664121
25
+ name: F1 Score
26
+ - type: accuracy
27
+ value: 0.9795007342143907
28
+ name: Accuracy
29
+ ---
30
+
31
+ # PhoBERT Fine-tuned for Vietnamese Legal QA
32
+
33
+ ## Model Description
34
+
35
+ This model is a fine-tuned version of [vinai/phobert-base](https://huggingface.co/vinai/phobert-base) for Vietnamese legal question answering.
36
+
37
+ ## Training Details
38
+
39
+ ### Training Data
40
+ - **Dataset**: Custom Vietnamese Legal QA dataset
41
+ - **Total QA pairs**: 156349
42
+ - **Training samples**: 96472
43
+ - **Validation samples**: 17025
44
+ - **Categories**: Công nghiệp, Thuế, phí, lệ phí, các khoản thu khác, Đất đai, Dân số, gia đình, trẻ em, bình đẳng giới, Quốc phòng, Hành chính tư pháp, Tài nguyên, Văn hóa, thể thao, du lịch, Giao thông, vận tải, Thông tin, báo chí, xuất bản, Tổ chức chính trị - xã hội, hội, Y tế, dược, Dân tộc, Thống kê, Khoa học, công nghệ, An ninh quốc gia, Tổ chức bộ máy nhà nước, Ngoại giao, điều ước quốc tế, Bổ trợ tư pháp, Tài sản công, nợ công, dự trữ nhà nước, Tố tụng và các phương thức giải quyết tranh chấp, Doanh nghiệp, hợp tác xã, Trật tự, an toàn xã hội
45
+
46
+ ### Training Configuration
47
+ - **Base model**: vinai/phobert-base
48
+ - **Learning rate**: 2e-05
49
+ - **Training epochs**: 3
50
+ - **Batch size**: 4
51
+ - **Max sequence length**: 256
52
+
53
+ ### Training Results
54
+ - **Training Loss**: 0.6344684727986654
55
+ - **Validation F1**: 0.602910749664121
56
+ - **Validation Accuracy**: 0.9795007342143907
57
+
58
+ ## Usage
59
+
60
+ ```python
61
+ from transformers import AutoTokenizer, AutoModelForQuestionAnswering
62
+ import torch
63
+
64
+ tokenizer = AutoTokenizer.from_pretrained("huynguyen251/phobert-legal-qa-v2")
65
+ model = AutoModelForQuestionAnswering.from_pretrained("huynguyen251/phobert-legal-qa-v2")
66
+
67
+ question = "Quy định này áp dụng cho ai?"
68
+ context = "Thanh niên là công dân Việt Nam từ đủ 16 tuổi đến 30 tuổi."
69
+
70
+ inputs = tokenizer(question, context, return_tensors="pt", max_length=512, truncation=True)
71
+ with torch.no_grad():
72
+ outputs = model(**inputs)
73
+
74
+ start_idx = torch.argmax(outputs.start_logits)
75
+ end_idx = torch.argmax(outputs.end_logits)
76
+ answer = tokenizer.decode(inputs["input_ids"][0][start_idx:end_idx+1])
77
+ print(f"Answer: {answer}")
78
+ ```
79
+
80
+ ## Categories
81
+
82
+ - Công nghiệp
83
+ - Thuế, phí, lệ phí, các khoản thu khác
84
+ - Đất đai
85
+ - Dân số, gia đình, trẻ em, bình đẳng giới
86
+ - Quốc phòng
87
+ - Hành chính tư pháp
88
+ - Tài nguyên
89
+ - Văn hóa, thể thao, du lịch
90
+ - Giao thông, vận tải
91
+ - Thông tin, báo chí, xuất bản
92
+ - Tổ chức chính trị - xã hội, hội
93
+ - Y tế, dược
94
+ - Dân tộc
95
+ - Thống kê
96
+ - Khoa học, công nghệ
97
+ - An ninh quốc gia
98
+ - Tổ chức bộ máy nhà nước
99
+ - Ngoại giao, điều ước quốc tế
100
+ - Bổ trợ tư pháp
101
+ - Tài sản công, nợ công, dự trữ nhà nước
102
+ - Tố tụng và các phương thức giải quyết tranh chấp
103
+ - Doanh nghiệp, hợp tác xã
104
+ - Trật tự, an toàn xã hội
105
+
106
+ ## Limitations
107
+
108
+ This model is trained on Vietnamese legal documents and may not generalize to other domains or languages.
109
+
110
+ ## Training Framework
111
+
112
+ - Framework: Transformers 4.44.2
113
+ - Language: Vietnamese
114
+ - License: Apache 2.0
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<mask>": 64000
3
+ }
bpe.codes ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1800/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<mask>": 64000
3
+ }
checkpoint-1800/bpe.codes ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-1800/config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "vinai/phobert-base",
3
+ "architectures": [
4
+ "RobertaForQuestionAnswering"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "gradient_checkpointing": false,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-05,
17
+ "max_position_embeddings": 258,
18
+ "model_type": "roberta",
19
+ "num_attention_heads": 12,
20
+ "num_hidden_layers": 12,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "tokenizer_class": "PhobertTokenizer",
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.44.2",
26
+ "type_vocab_size": 1,
27
+ "use_cache": true,
28
+ "vocab_size": 64001
29
+ }
checkpoint-1800/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:389e7b032149cdbe860f7307cb2dcd4781aef2ac7567a748a409a876678315f4
3
+ size 537660792
checkpoint-1800/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:abe7fb5f8c884934c8b8ac9631142c7e8d103542f4a3f52801e3061828296a6f
3
+ size 1075440186
checkpoint-1800/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c12cf28c5b9a556304cc9f06fbdbe004de036b71c6baf556677bc0e5c28d0efb
3
+ size 14244
checkpoint-1800/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad59ffa41114700072f13d95d885dec7d70c959b1cba63fac9667f20ef5f079b
3
+ size 1064
checkpoint-1800/special_tokens_map.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": "<mask>",
6
+ "pad_token": "<pad>",
7
+ "sep_token": "</s>",
8
+ "unk_token": "<unk>"
9
+ }
checkpoint-1800/tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "64000": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 1000000000000000019884624838656,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "PhobertTokenizer",
53
+ "unk_token": "<unk>"
54
+ }
checkpoint-1800/trainer_state.json ADDED
@@ -0,0 +1,420 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.602910749664121,
3
+ "best_model_checkpoint": "phobert-legal-qa-finetuned\\checkpoint-1800",
4
+ "epoch": 0.29853221660170826,
5
+ "eval_steps": 200,
6
+ "global_step": 1800,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.008292561572269675,
13
+ "grad_norm": 8.954318046569824,
14
+ "learning_rate": 5.417357656163627e-07,
15
+ "loss": 5.4244,
16
+ "step": 50
17
+ },
18
+ {
19
+ "epoch": 0.01658512314453935,
20
+ "grad_norm": 6.557638168334961,
21
+ "learning_rate": 1.0834715312327253e-06,
22
+ "loss": 5.1885,
23
+ "step": 100
24
+ },
25
+ {
26
+ "epoch": 0.024877684716809022,
27
+ "grad_norm": 6.624057769775391,
28
+ "learning_rate": 1.6141514648977336e-06,
29
+ "loss": 4.6824,
30
+ "step": 150
31
+ },
32
+ {
33
+ "epoch": 0.0331702462890787,
34
+ "grad_norm": 6.447659015655518,
35
+ "learning_rate": 2.1669430624654506e-06,
36
+ "loss": 3.9743,
37
+ "step": 200
38
+ },
39
+ {
40
+ "epoch": 0.0331702462890787,
41
+ "eval_accuracy": 0.5348311306901615,
42
+ "eval_end_accuracy": 0.29080763582966224,
43
+ "eval_end_f1": 0.20537819782547337,
44
+ "eval_f1": 0.10898462727065836,
45
+ "eval_loss": 3.2860655784606934,
46
+ "eval_runtime": 127.3249,
47
+ "eval_samples_per_second": 133.713,
48
+ "eval_start_accuracy": 0.7788546255506608,
49
+ "eval_start_f1": 0.012591056715843358,
50
+ "eval_steps_per_second": 16.721,
51
+ "step": 200
52
+ },
53
+ {
54
+ "epoch": 0.04146280786134837,
55
+ "grad_norm": 5.946743965148926,
56
+ "learning_rate": 2.7197346600331676e-06,
57
+ "loss": 3.0621,
58
+ "step": 250
59
+ },
60
+ {
61
+ "epoch": 0.049755369433618045,
62
+ "grad_norm": 7.603392601013184,
63
+ "learning_rate": 3.272526257600885e-06,
64
+ "loss": 1.9122,
65
+ "step": 300
66
+ },
67
+ {
68
+ "epoch": 0.05804793100588772,
69
+ "grad_norm": 4.718493461608887,
70
+ "learning_rate": 3.825317855168602e-06,
71
+ "loss": 0.8663,
72
+ "step": 350
73
+ },
74
+ {
75
+ "epoch": 0.0663404925781574,
76
+ "grad_norm": 1.8805421590805054,
77
+ "learning_rate": 4.367053620784965e-06,
78
+ "loss": 0.4477,
79
+ "step": 400
80
+ },
81
+ {
82
+ "epoch": 0.0663404925781574,
83
+ "eval_accuracy": 0.9048458149779737,
84
+ "eval_end_accuracy": 0.9175917767988253,
85
+ "eval_end_f1": 0.8303333971515077,
86
+ "eval_f1": 0.424233749861785,
87
+ "eval_loss": 0.23289425671100616,
88
+ "eval_runtime": 129.7527,
89
+ "eval_samples_per_second": 131.211,
90
+ "eval_start_accuracy": 0.8920998531571219,
91
+ "eval_start_f1": 0.018134102572062404,
92
+ "eval_steps_per_second": 16.408,
93
+ "step": 400
94
+ },
95
+ {
96
+ "epoch": 0.07463305415042706,
97
+ "grad_norm": 1.6680586338043213,
98
+ "learning_rate": 4.919845218352681e-06,
99
+ "loss": 0.2926,
100
+ "step": 450
101
+ },
102
+ {
103
+ "epoch": 0.08292561572269674,
104
+ "grad_norm": 1.3913531303405762,
105
+ "learning_rate": 5.472636815920398e-06,
106
+ "loss": 0.2317,
107
+ "step": 500
108
+ },
109
+ {
110
+ "epoch": 0.09121817729496641,
111
+ "grad_norm": 0.8732656836509705,
112
+ "learning_rate": 6.025428413488116e-06,
113
+ "loss": 0.2013,
114
+ "step": 550
115
+ },
116
+ {
117
+ "epoch": 0.09951073886723609,
118
+ "grad_norm": 0.4801824986934662,
119
+ "learning_rate": 6.578220011055833e-06,
120
+ "loss": 0.1501,
121
+ "step": 600
122
+ },
123
+ {
124
+ "epoch": 0.09951073886723609,
125
+ "eval_accuracy": 0.9687224669603525,
126
+ "eval_end_accuracy": 0.9620558002936858,
127
+ "eval_end_f1": 0.9029114827270522,
128
+ "eval_f1": 0.47707631709814124,
129
+ "eval_loss": 0.10375536233186722,
130
+ "eval_runtime": 115.3633,
131
+ "eval_samples_per_second": 147.577,
132
+ "eval_start_accuracy": 0.9753891336270191,
133
+ "eval_start_f1": 0.051241151469230285,
134
+ "eval_steps_per_second": 18.455,
135
+ "step": 600
136
+ },
137
+ {
138
+ "epoch": 0.10780330043950577,
139
+ "grad_norm": 3.085040807723999,
140
+ "learning_rate": 7.131011608623549e-06,
141
+ "loss": 0.147,
142
+ "step": 650
143
+ },
144
+ {
145
+ "epoch": 0.11609586201177544,
146
+ "grad_norm": 2.0271573066711426,
147
+ "learning_rate": 7.672747374239912e-06,
148
+ "loss": 0.1356,
149
+ "step": 700
150
+ },
151
+ {
152
+ "epoch": 0.12438842358404512,
153
+ "grad_norm": 0.833656370639801,
154
+ "learning_rate": 8.22553897180763e-06,
155
+ "loss": 0.1604,
156
+ "step": 750
157
+ },
158
+ {
159
+ "epoch": 0.1326809851563148,
160
+ "grad_norm": 4.890761375427246,
161
+ "learning_rate": 8.778330569375346e-06,
162
+ "loss": 0.2731,
163
+ "step": 800
164
+ },
165
+ {
166
+ "epoch": 0.1326809851563148,
167
+ "eval_accuracy": 0.9534801762114538,
168
+ "eval_end_accuracy": 0.9742143906020558,
169
+ "eval_end_f1": 0.9321589248479495,
170
+ "eval_f1": 0.49608082825261623,
171
+ "eval_loss": 0.1122935563325882,
172
+ "eval_runtime": 133.8336,
173
+ "eval_samples_per_second": 127.21,
174
+ "eval_start_accuracy": 0.9327459618208517,
175
+ "eval_start_f1": 0.06000273165728297,
176
+ "eval_steps_per_second": 15.908,
177
+ "step": 800
178
+ },
179
+ {
180
+ "epoch": 0.14097354672858445,
181
+ "grad_norm": 1.1853266954421997,
182
+ "learning_rate": 9.331122166943063e-06,
183
+ "loss": 0.1276,
184
+ "step": 850
185
+ },
186
+ {
187
+ "epoch": 0.14926610830085413,
188
+ "grad_norm": 7.255343437194824,
189
+ "learning_rate": 9.88391376451078e-06,
190
+ "loss": 0.0954,
191
+ "step": 900
192
+ },
193
+ {
194
+ "epoch": 0.1575586698731238,
195
+ "grad_norm": 0.5000291466712952,
196
+ "learning_rate": 1.0436705362078497e-05,
197
+ "loss": 0.1081,
198
+ "step": 950
199
+ },
200
+ {
201
+ "epoch": 0.1658512314453935,
202
+ "grad_norm": 0.3002016544342041,
203
+ "learning_rate": 1.0989496959646216e-05,
204
+ "loss": 0.1257,
205
+ "step": 1000
206
+ },
207
+ {
208
+ "epoch": 0.1658512314453935,
209
+ "eval_accuracy": 0.9772980910425844,
210
+ "eval_end_accuracy": 0.9769750367107195,
211
+ "eval_end_f1": 0.9421694663520186,
212
+ "eval_f1": 0.5088194880993329,
213
+ "eval_loss": 0.08298086374998093,
214
+ "eval_runtime": 127.1174,
215
+ "eval_samples_per_second": 133.931,
216
+ "eval_start_accuracy": 0.9776211453744493,
217
+ "eval_start_f1": 0.07546950984664719,
218
+ "eval_steps_per_second": 16.748,
219
+ "step": 1000
220
+ },
221
+ {
222
+ "epoch": 0.17414379301766317,
223
+ "grad_norm": 1.4803558588027954,
224
+ "learning_rate": 1.1542288557213931e-05,
225
+ "loss": 0.0988,
226
+ "step": 1050
227
+ },
228
+ {
229
+ "epoch": 0.18243635458993282,
230
+ "grad_norm": 5.256414413452148,
231
+ "learning_rate": 1.2095080154781648e-05,
232
+ "loss": 0.1159,
233
+ "step": 1100
234
+ },
235
+ {
236
+ "epoch": 0.1907289161622025,
237
+ "grad_norm": 2.2532193660736084,
238
+ "learning_rate": 1.2647871752349365e-05,
239
+ "loss": 0.1119,
240
+ "step": 1150
241
+ },
242
+ {
243
+ "epoch": 0.19902147773447218,
244
+ "grad_norm": 1.1581368446350098,
245
+ "learning_rate": 1.3200663349917082e-05,
246
+ "loss": 0.0801,
247
+ "step": 1200
248
+ },
249
+ {
250
+ "epoch": 0.19902147773447218,
251
+ "eval_accuracy": 0.9774743024963289,
252
+ "eval_end_accuracy": 0.9773274596182085,
253
+ "eval_end_f1": 0.9522660725045008,
254
+ "eval_f1": 0.5138705881207118,
255
+ "eval_loss": 0.06253915280103683,
256
+ "eval_runtime": 129.6143,
257
+ "eval_samples_per_second": 131.351,
258
+ "eval_start_accuracy": 0.9776211453744493,
259
+ "eval_start_f1": 0.0754751037369228,
260
+ "eval_steps_per_second": 16.426,
261
+ "step": 1200
262
+ },
263
+ {
264
+ "epoch": 0.20731403930674186,
265
+ "grad_norm": 3.3523240089416504,
266
+ "learning_rate": 1.3742399115533445e-05,
267
+ "loss": 0.1058,
268
+ "step": 1250
269
+ },
270
+ {
271
+ "epoch": 0.21560660087901154,
272
+ "grad_norm": 0.09368986636400223,
273
+ "learning_rate": 1.429519071310116e-05,
274
+ "loss": 0.1002,
275
+ "step": 1300
276
+ },
277
+ {
278
+ "epoch": 0.2238991624512812,
279
+ "grad_norm": 2.6880834102630615,
280
+ "learning_rate": 1.4847982310668878e-05,
281
+ "loss": 0.0563,
282
+ "step": 1350
283
+ },
284
+ {
285
+ "epoch": 0.23219172402355087,
286
+ "grad_norm": 10.309085845947266,
287
+ "learning_rate": 1.5400773908236596e-05,
288
+ "loss": 0.0973,
289
+ "step": 1400
290
+ },
291
+ {
292
+ "epoch": 0.23219172402355087,
293
+ "eval_accuracy": 0.9787371512481644,
294
+ "eval_end_accuracy": 0.978208516886931,
295
+ "eval_end_f1": 0.9484799222981063,
296
+ "eval_f1": 0.5438896216202269,
297
+ "eval_loss": 0.0639370009303093,
298
+ "eval_runtime": 128.1258,
299
+ "eval_samples_per_second": 132.877,
300
+ "eval_start_accuracy": 0.9792657856093979,
301
+ "eval_start_f1": 0.13929932094234762,
302
+ "eval_steps_per_second": 16.616,
303
+ "step": 1400
304
+ },
305
+ {
306
+ "epoch": 0.24048428559582055,
307
+ "grad_norm": 0.07747649401426315,
308
+ "learning_rate": 1.5953565505804315e-05,
309
+ "loss": 0.0724,
310
+ "step": 1450
311
+ },
312
+ {
313
+ "epoch": 0.24877684716809023,
314
+ "grad_norm": 0.076473668217659,
315
+ "learning_rate": 1.650635710337203e-05,
316
+ "loss": 0.0928,
317
+ "step": 1500
318
+ },
319
+ {
320
+ "epoch": 0.2570694087403599,
321
+ "grad_norm": 0.18258516490459442,
322
+ "learning_rate": 1.7059148700939746e-05,
323
+ "loss": 0.085,
324
+ "step": 1550
325
+ },
326
+ {
327
+ "epoch": 0.2653619703126296,
328
+ "grad_norm": 2.0234451293945312,
329
+ "learning_rate": 1.7611940298507464e-05,
330
+ "loss": 0.1107,
331
+ "step": 1600
332
+ },
333
+ {
334
+ "epoch": 0.2653619703126296,
335
+ "eval_accuracy": 0.9774449339207049,
336
+ "eval_end_accuracy": 0.9773274596182085,
337
+ "eval_end_f1": 0.9548822846052203,
338
+ "eval_f1": 0.5144663543275407,
339
+ "eval_loss": 0.06893135607242584,
340
+ "eval_runtime": 137.3173,
341
+ "eval_samples_per_second": 123.983,
342
+ "eval_start_accuracy": 0.9775624082232012,
343
+ "eval_start_f1": 0.07405042404986113,
344
+ "eval_steps_per_second": 15.504,
345
+ "step": 1600
346
+ },
347
+ {
348
+ "epoch": 0.27365453188489924,
349
+ "grad_norm": 2.9243485927581787,
350
+ "learning_rate": 1.816473189607518e-05,
351
+ "loss": 0.069,
352
+ "step": 1650
353
+ },
354
+ {
355
+ "epoch": 0.2819470934571689,
356
+ "grad_norm": 0.05162263661623001,
357
+ "learning_rate": 1.87175234936429e-05,
358
+ "loss": 0.0815,
359
+ "step": 1700
360
+ },
361
+ {
362
+ "epoch": 0.2902396550294386,
363
+ "grad_norm": 0.12183202058076859,
364
+ "learning_rate": 1.9270315091210617e-05,
365
+ "loss": 0.0741,
366
+ "step": 1750
367
+ },
368
+ {
369
+ "epoch": 0.29853221660170826,
370
+ "grad_norm": 3.247403621673584,
371
+ "learning_rate": 1.9823106688778332e-05,
372
+ "loss": 0.0819,
373
+ "step": 1800
374
+ },
375
+ {
376
+ "epoch": 0.29853221660170826,
377
+ "eval_accuracy": 0.9795007342143907,
378
+ "eval_end_accuracy": 0.9793832599118942,
379
+ "eval_end_f1": 0.9547976694500852,
380
+ "eval_f1": 0.602910749664121,
381
+ "eval_loss": 0.07215487957000732,
382
+ "eval_runtime": 135.7664,
383
+ "eval_samples_per_second": 125.399,
384
+ "eval_start_accuracy": 0.979618208516887,
385
+ "eval_start_f1": 0.25102382987815675,
386
+ "eval_steps_per_second": 15.681,
387
+ "step": 1800
388
+ }
389
+ ],
390
+ "logging_steps": 50,
391
+ "max_steps": 18087,
392
+ "num_input_tokens_seen": 0,
393
+ "num_train_epochs": 3,
394
+ "save_steps": 200,
395
+ "stateful_callbacks": {
396
+ "EarlyStoppingCallback": {
397
+ "args": {
398
+ "early_stopping_patience": 3,
399
+ "early_stopping_threshold": 0.001
400
+ },
401
+ "attributes": {
402
+ "early_stopping_patience_counter": 0
403
+ }
404
+ },
405
+ "TrainerControl": {
406
+ "args": {
407
+ "should_epoch_stop": false,
408
+ "should_evaluate": false,
409
+ "should_log": false,
410
+ "should_save": true,
411
+ "should_training_stop": false
412
+ },
413
+ "attributes": {}
414
+ }
415
+ },
416
+ "total_flos": 3762673296998400.0,
417
+ "train_batch_size": 4,
418
+ "trial_name": null,
419
+ "trial_params": null
420
+ }
checkpoint-1800/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a43dfd650d40278c5424b1ba5b0067a9c7fba4f97e85ccc8fcc46c3360e49acd
3
+ size 5240
checkpoint-1800/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-2200/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<mask>": 64000
3
+ }
checkpoint-2200/bpe.codes ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-2200/config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "vinai/phobert-base",
3
+ "architectures": [
4
+ "RobertaForQuestionAnswering"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "gradient_checkpointing": false,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-05,
17
+ "max_position_embeddings": 258,
18
+ "model_type": "roberta",
19
+ "num_attention_heads": 12,
20
+ "num_hidden_layers": 12,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "tokenizer_class": "PhobertTokenizer",
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.44.2",
26
+ "type_vocab_size": 1,
27
+ "use_cache": true,
28
+ "vocab_size": 64001
29
+ }
checkpoint-2200/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27affbf41e4d22257bdeff00abfccf2888e85e109c98a4893cd8e095c4d5ac30
3
+ size 537660792
checkpoint-2200/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67593cec92610ed243120384db0c577e1dc84e0cfd62d7ff066ce8771f9b581c
3
+ size 1075440186
checkpoint-2200/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5c11fc4dbfa4e1a34a025e1421a3a50c85f0305e8ef2bdacbf6d591cb6cc493
3
+ size 14244
checkpoint-2200/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d48585b464483c510957b80617f6db4dad41376ea6783669555094835842bf72
3
+ size 1064
checkpoint-2200/special_tokens_map.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": "<mask>",
6
+ "pad_token": "<pad>",
7
+ "sep_token": "</s>",
8
+ "unk_token": "<unk>"
9
+ }
checkpoint-2200/tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "64000": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 1000000000000000019884624838656,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "PhobertTokenizer",
53
+ "unk_token": "<unk>"
54
+ }
checkpoint-2200/trainer_state.json ADDED
@@ -0,0 +1,504 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.602910749664121,
3
+ "best_model_checkpoint": "phobert-legal-qa-finetuned\\checkpoint-1800",
4
+ "epoch": 0.36487270917986564,
5
+ "eval_steps": 200,
6
+ "global_step": 2200,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.008292561572269675,
13
+ "grad_norm": 8.954318046569824,
14
+ "learning_rate": 5.417357656163627e-07,
15
+ "loss": 5.4244,
16
+ "step": 50
17
+ },
18
+ {
19
+ "epoch": 0.01658512314453935,
20
+ "grad_norm": 6.557638168334961,
21
+ "learning_rate": 1.0834715312327253e-06,
22
+ "loss": 5.1885,
23
+ "step": 100
24
+ },
25
+ {
26
+ "epoch": 0.024877684716809022,
27
+ "grad_norm": 6.624057769775391,
28
+ "learning_rate": 1.6141514648977336e-06,
29
+ "loss": 4.6824,
30
+ "step": 150
31
+ },
32
+ {
33
+ "epoch": 0.0331702462890787,
34
+ "grad_norm": 6.447659015655518,
35
+ "learning_rate": 2.1669430624654506e-06,
36
+ "loss": 3.9743,
37
+ "step": 200
38
+ },
39
+ {
40
+ "epoch": 0.0331702462890787,
41
+ "eval_accuracy": 0.5348311306901615,
42
+ "eval_end_accuracy": 0.29080763582966224,
43
+ "eval_end_f1": 0.20537819782547337,
44
+ "eval_f1": 0.10898462727065836,
45
+ "eval_loss": 3.2860655784606934,
46
+ "eval_runtime": 127.3249,
47
+ "eval_samples_per_second": 133.713,
48
+ "eval_start_accuracy": 0.7788546255506608,
49
+ "eval_start_f1": 0.012591056715843358,
50
+ "eval_steps_per_second": 16.721,
51
+ "step": 200
52
+ },
53
+ {
54
+ "epoch": 0.04146280786134837,
55
+ "grad_norm": 5.946743965148926,
56
+ "learning_rate": 2.7197346600331676e-06,
57
+ "loss": 3.0621,
58
+ "step": 250
59
+ },
60
+ {
61
+ "epoch": 0.049755369433618045,
62
+ "grad_norm": 7.603392601013184,
63
+ "learning_rate": 3.272526257600885e-06,
64
+ "loss": 1.9122,
65
+ "step": 300
66
+ },
67
+ {
68
+ "epoch": 0.05804793100588772,
69
+ "grad_norm": 4.718493461608887,
70
+ "learning_rate": 3.825317855168602e-06,
71
+ "loss": 0.8663,
72
+ "step": 350
73
+ },
74
+ {
75
+ "epoch": 0.0663404925781574,
76
+ "grad_norm": 1.8805421590805054,
77
+ "learning_rate": 4.367053620784965e-06,
78
+ "loss": 0.4477,
79
+ "step": 400
80
+ },
81
+ {
82
+ "epoch": 0.0663404925781574,
83
+ "eval_accuracy": 0.9048458149779737,
84
+ "eval_end_accuracy": 0.9175917767988253,
85
+ "eval_end_f1": 0.8303333971515077,
86
+ "eval_f1": 0.424233749861785,
87
+ "eval_loss": 0.23289425671100616,
88
+ "eval_runtime": 129.7527,
89
+ "eval_samples_per_second": 131.211,
90
+ "eval_start_accuracy": 0.8920998531571219,
91
+ "eval_start_f1": 0.018134102572062404,
92
+ "eval_steps_per_second": 16.408,
93
+ "step": 400
94
+ },
95
+ {
96
+ "epoch": 0.07463305415042706,
97
+ "grad_norm": 1.6680586338043213,
98
+ "learning_rate": 4.919845218352681e-06,
99
+ "loss": 0.2926,
100
+ "step": 450
101
+ },
102
+ {
103
+ "epoch": 0.08292561572269674,
104
+ "grad_norm": 1.3913531303405762,
105
+ "learning_rate": 5.472636815920398e-06,
106
+ "loss": 0.2317,
107
+ "step": 500
108
+ },
109
+ {
110
+ "epoch": 0.09121817729496641,
111
+ "grad_norm": 0.8732656836509705,
112
+ "learning_rate": 6.025428413488116e-06,
113
+ "loss": 0.2013,
114
+ "step": 550
115
+ },
116
+ {
117
+ "epoch": 0.09951073886723609,
118
+ "grad_norm": 0.4801824986934662,
119
+ "learning_rate": 6.578220011055833e-06,
120
+ "loss": 0.1501,
121
+ "step": 600
122
+ },
123
+ {
124
+ "epoch": 0.09951073886723609,
125
+ "eval_accuracy": 0.9687224669603525,
126
+ "eval_end_accuracy": 0.9620558002936858,
127
+ "eval_end_f1": 0.9029114827270522,
128
+ "eval_f1": 0.47707631709814124,
129
+ "eval_loss": 0.10375536233186722,
130
+ "eval_runtime": 115.3633,
131
+ "eval_samples_per_second": 147.577,
132
+ "eval_start_accuracy": 0.9753891336270191,
133
+ "eval_start_f1": 0.051241151469230285,
134
+ "eval_steps_per_second": 18.455,
135
+ "step": 600
136
+ },
137
+ {
138
+ "epoch": 0.10780330043950577,
139
+ "grad_norm": 3.085040807723999,
140
+ "learning_rate": 7.131011608623549e-06,
141
+ "loss": 0.147,
142
+ "step": 650
143
+ },
144
+ {
145
+ "epoch": 0.11609586201177544,
146
+ "grad_norm": 2.0271573066711426,
147
+ "learning_rate": 7.672747374239912e-06,
148
+ "loss": 0.1356,
149
+ "step": 700
150
+ },
151
+ {
152
+ "epoch": 0.12438842358404512,
153
+ "grad_norm": 0.833656370639801,
154
+ "learning_rate": 8.22553897180763e-06,
155
+ "loss": 0.1604,
156
+ "step": 750
157
+ },
158
+ {
159
+ "epoch": 0.1326809851563148,
160
+ "grad_norm": 4.890761375427246,
161
+ "learning_rate": 8.778330569375346e-06,
162
+ "loss": 0.2731,
163
+ "step": 800
164
+ },
165
+ {
166
+ "epoch": 0.1326809851563148,
167
+ "eval_accuracy": 0.9534801762114538,
168
+ "eval_end_accuracy": 0.9742143906020558,
169
+ "eval_end_f1": 0.9321589248479495,
170
+ "eval_f1": 0.49608082825261623,
171
+ "eval_loss": 0.1122935563325882,
172
+ "eval_runtime": 133.8336,
173
+ "eval_samples_per_second": 127.21,
174
+ "eval_start_accuracy": 0.9327459618208517,
175
+ "eval_start_f1": 0.06000273165728297,
176
+ "eval_steps_per_second": 15.908,
177
+ "step": 800
178
+ },
179
+ {
180
+ "epoch": 0.14097354672858445,
181
+ "grad_norm": 1.1853266954421997,
182
+ "learning_rate": 9.331122166943063e-06,
183
+ "loss": 0.1276,
184
+ "step": 850
185
+ },
186
+ {
187
+ "epoch": 0.14926610830085413,
188
+ "grad_norm": 7.255343437194824,
189
+ "learning_rate": 9.88391376451078e-06,
190
+ "loss": 0.0954,
191
+ "step": 900
192
+ },
193
+ {
194
+ "epoch": 0.1575586698731238,
195
+ "grad_norm": 0.5000291466712952,
196
+ "learning_rate": 1.0436705362078497e-05,
197
+ "loss": 0.1081,
198
+ "step": 950
199
+ },
200
+ {
201
+ "epoch": 0.1658512314453935,
202
+ "grad_norm": 0.3002016544342041,
203
+ "learning_rate": 1.0989496959646216e-05,
204
+ "loss": 0.1257,
205
+ "step": 1000
206
+ },
207
+ {
208
+ "epoch": 0.1658512314453935,
209
+ "eval_accuracy": 0.9772980910425844,
210
+ "eval_end_accuracy": 0.9769750367107195,
211
+ "eval_end_f1": 0.9421694663520186,
212
+ "eval_f1": 0.5088194880993329,
213
+ "eval_loss": 0.08298086374998093,
214
+ "eval_runtime": 127.1174,
215
+ "eval_samples_per_second": 133.931,
216
+ "eval_start_accuracy": 0.9776211453744493,
217
+ "eval_start_f1": 0.07546950984664719,
218
+ "eval_steps_per_second": 16.748,
219
+ "step": 1000
220
+ },
221
+ {
222
+ "epoch": 0.17414379301766317,
223
+ "grad_norm": 1.4803558588027954,
224
+ "learning_rate": 1.1542288557213931e-05,
225
+ "loss": 0.0988,
226
+ "step": 1050
227
+ },
228
+ {
229
+ "epoch": 0.18243635458993282,
230
+ "grad_norm": 5.256414413452148,
231
+ "learning_rate": 1.2095080154781648e-05,
232
+ "loss": 0.1159,
233
+ "step": 1100
234
+ },
235
+ {
236
+ "epoch": 0.1907289161622025,
237
+ "grad_norm": 2.2532193660736084,
238
+ "learning_rate": 1.2647871752349365e-05,
239
+ "loss": 0.1119,
240
+ "step": 1150
241
+ },
242
+ {
243
+ "epoch": 0.19902147773447218,
244
+ "grad_norm": 1.1581368446350098,
245
+ "learning_rate": 1.3200663349917082e-05,
246
+ "loss": 0.0801,
247
+ "step": 1200
248
+ },
249
+ {
250
+ "epoch": 0.19902147773447218,
251
+ "eval_accuracy": 0.9774743024963289,
252
+ "eval_end_accuracy": 0.9773274596182085,
253
+ "eval_end_f1": 0.9522660725045008,
254
+ "eval_f1": 0.5138705881207118,
255
+ "eval_loss": 0.06253915280103683,
256
+ "eval_runtime": 129.6143,
257
+ "eval_samples_per_second": 131.351,
258
+ "eval_start_accuracy": 0.9776211453744493,
259
+ "eval_start_f1": 0.0754751037369228,
260
+ "eval_steps_per_second": 16.426,
261
+ "step": 1200
262
+ },
263
+ {
264
+ "epoch": 0.20731403930674186,
265
+ "grad_norm": 3.3523240089416504,
266
+ "learning_rate": 1.3742399115533445e-05,
267
+ "loss": 0.1058,
268
+ "step": 1250
269
+ },
270
+ {
271
+ "epoch": 0.21560660087901154,
272
+ "grad_norm": 0.09368986636400223,
273
+ "learning_rate": 1.429519071310116e-05,
274
+ "loss": 0.1002,
275
+ "step": 1300
276
+ },
277
+ {
278
+ "epoch": 0.2238991624512812,
279
+ "grad_norm": 2.6880834102630615,
280
+ "learning_rate": 1.4847982310668878e-05,
281
+ "loss": 0.0563,
282
+ "step": 1350
283
+ },
284
+ {
285
+ "epoch": 0.23219172402355087,
286
+ "grad_norm": 10.309085845947266,
287
+ "learning_rate": 1.5400773908236596e-05,
288
+ "loss": 0.0973,
289
+ "step": 1400
290
+ },
291
+ {
292
+ "epoch": 0.23219172402355087,
293
+ "eval_accuracy": 0.9787371512481644,
294
+ "eval_end_accuracy": 0.978208516886931,
295
+ "eval_end_f1": 0.9484799222981063,
296
+ "eval_f1": 0.5438896216202269,
297
+ "eval_loss": 0.0639370009303093,
298
+ "eval_runtime": 128.1258,
299
+ "eval_samples_per_second": 132.877,
300
+ "eval_start_accuracy": 0.9792657856093979,
301
+ "eval_start_f1": 0.13929932094234762,
302
+ "eval_steps_per_second": 16.616,
303
+ "step": 1400
304
+ },
305
+ {
306
+ "epoch": 0.24048428559582055,
307
+ "grad_norm": 0.07747649401426315,
308
+ "learning_rate": 1.5953565505804315e-05,
309
+ "loss": 0.0724,
310
+ "step": 1450
311
+ },
312
+ {
313
+ "epoch": 0.24877684716809023,
314
+ "grad_norm": 0.076473668217659,
315
+ "learning_rate": 1.650635710337203e-05,
316
+ "loss": 0.0928,
317
+ "step": 1500
318
+ },
319
+ {
320
+ "epoch": 0.2570694087403599,
321
+ "grad_norm": 0.18258516490459442,
322
+ "learning_rate": 1.7059148700939746e-05,
323
+ "loss": 0.085,
324
+ "step": 1550
325
+ },
326
+ {
327
+ "epoch": 0.2653619703126296,
328
+ "grad_norm": 2.0234451293945312,
329
+ "learning_rate": 1.7611940298507464e-05,
330
+ "loss": 0.1107,
331
+ "step": 1600
332
+ },
333
+ {
334
+ "epoch": 0.2653619703126296,
335
+ "eval_accuracy": 0.9774449339207049,
336
+ "eval_end_accuracy": 0.9773274596182085,
337
+ "eval_end_f1": 0.9548822846052203,
338
+ "eval_f1": 0.5144663543275407,
339
+ "eval_loss": 0.06893135607242584,
340
+ "eval_runtime": 137.3173,
341
+ "eval_samples_per_second": 123.983,
342
+ "eval_start_accuracy": 0.9775624082232012,
343
+ "eval_start_f1": 0.07405042404986113,
344
+ "eval_steps_per_second": 15.504,
345
+ "step": 1600
346
+ },
347
+ {
348
+ "epoch": 0.27365453188489924,
349
+ "grad_norm": 2.9243485927581787,
350
+ "learning_rate": 1.816473189607518e-05,
351
+ "loss": 0.069,
352
+ "step": 1650
353
+ },
354
+ {
355
+ "epoch": 0.2819470934571689,
356
+ "grad_norm": 0.05162263661623001,
357
+ "learning_rate": 1.87175234936429e-05,
358
+ "loss": 0.0815,
359
+ "step": 1700
360
+ },
361
+ {
362
+ "epoch": 0.2902396550294386,
363
+ "grad_norm": 0.12183202058076859,
364
+ "learning_rate": 1.9270315091210617e-05,
365
+ "loss": 0.0741,
366
+ "step": 1750
367
+ },
368
+ {
369
+ "epoch": 0.29853221660170826,
370
+ "grad_norm": 3.247403621673584,
371
+ "learning_rate": 1.9823106688778332e-05,
372
+ "loss": 0.0819,
373
+ "step": 1800
374
+ },
375
+ {
376
+ "epoch": 0.29853221660170826,
377
+ "eval_accuracy": 0.9795007342143907,
378
+ "eval_end_accuracy": 0.9793832599118942,
379
+ "eval_end_f1": 0.9547976694500852,
380
+ "eval_f1": 0.602910749664121,
381
+ "eval_loss": 0.07215487957000732,
382
+ "eval_runtime": 135.7664,
383
+ "eval_samples_per_second": 125.399,
384
+ "eval_start_accuracy": 0.979618208516887,
385
+ "eval_start_f1": 0.25102382987815675,
386
+ "eval_steps_per_second": 15.681,
387
+ "step": 1800
388
+ },
389
+ {
390
+ "epoch": 0.30682477817397796,
391
+ "grad_norm": 0.042116910219192505,
392
+ "learning_rate": 1.9958225826268585e-05,
393
+ "loss": 0.0641,
394
+ "step": 1850
395
+ },
396
+ {
397
+ "epoch": 0.3151173397462476,
398
+ "grad_norm": 0.36534908413887024,
399
+ "learning_rate": 1.989679321784003e-05,
400
+ "loss": 0.1036,
401
+ "step": 1900
402
+ },
403
+ {
404
+ "epoch": 0.32340990131851727,
405
+ "grad_norm": 0.2260214239358902,
406
+ "learning_rate": 1.9835360609411478e-05,
407
+ "loss": 0.0751,
408
+ "step": 1950
409
+ },
410
+ {
411
+ "epoch": 0.331702462890787,
412
+ "grad_norm": 0.03621504455804825,
413
+ "learning_rate": 1.977392800098292e-05,
414
+ "loss": 0.0632,
415
+ "step": 2000
416
+ },
417
+ {
418
+ "epoch": 0.331702462890787,
419
+ "eval_accuracy": 0.9779148311306902,
420
+ "eval_end_accuracy": 0.9779735682819384,
421
+ "eval_end_f1": 0.9323578270443484,
422
+ "eval_f1": 0.5095598212540562,
423
+ "eval_loss": 0.058555059134960175,
424
+ "eval_runtime": 136.5016,
425
+ "eval_samples_per_second": 124.724,
426
+ "eval_start_accuracy": 0.977856093979442,
427
+ "eval_start_f1": 0.08676181546376394,
428
+ "eval_steps_per_second": 15.597,
429
+ "step": 2000
430
+ },
431
+ {
432
+ "epoch": 0.3399950244630566,
433
+ "grad_norm": 0.4064314365386963,
434
+ "learning_rate": 1.9712495392554368e-05,
435
+ "loss": 0.0833,
436
+ "step": 2050
437
+ },
438
+ {
439
+ "epoch": 0.34828758603532634,
440
+ "grad_norm": 1.703637957572937,
441
+ "learning_rate": 1.9651062784125818e-05,
442
+ "loss": 0.1415,
443
+ "step": 2100
444
+ },
445
+ {
446
+ "epoch": 0.356580147607596,
447
+ "grad_norm": 0.7372889518737793,
448
+ "learning_rate": 1.958963017569726e-05,
449
+ "loss": 0.2726,
450
+ "step": 2150
451
+ },
452
+ {
453
+ "epoch": 0.36487270917986564,
454
+ "grad_norm": 0.28392699360847473,
455
+ "learning_rate": 1.9528197567268707e-05,
456
+ "loss": 0.137,
457
+ "step": 2200
458
+ },
459
+ {
460
+ "epoch": 0.36487270917986564,
461
+ "eval_accuracy": 0.9787958883994126,
462
+ "eval_end_accuracy": 0.9785609397944199,
463
+ "eval_end_f1": 0.9504899506887509,
464
+ "eval_f1": 0.5319231158110048,
465
+ "eval_loss": 0.05977020785212517,
466
+ "eval_runtime": 136.9823,
467
+ "eval_samples_per_second": 124.286,
468
+ "eval_start_accuracy": 0.9790308370044053,
469
+ "eval_start_f1": 0.11335628093325875,
470
+ "eval_steps_per_second": 15.542,
471
+ "step": 2200
472
+ }
473
+ ],
474
+ "logging_steps": 50,
475
+ "max_steps": 18087,
476
+ "num_input_tokens_seen": 0,
477
+ "num_train_epochs": 3,
478
+ "save_steps": 200,
479
+ "stateful_callbacks": {
480
+ "EarlyStoppingCallback": {
481
+ "args": {
482
+ "early_stopping_patience": 3,
483
+ "early_stopping_threshold": 0.001
484
+ },
485
+ "attributes": {
486
+ "early_stopping_patience_counter": 0
487
+ }
488
+ },
489
+ "TrainerControl": {
490
+ "args": {
491
+ "should_epoch_stop": false,
492
+ "should_evaluate": false,
493
+ "should_log": false,
494
+ "should_save": true,
495
+ "should_training_stop": false
496
+ },
497
+ "attributes": {}
498
+ }
499
+ },
500
+ "total_flos": 4598822918553600.0,
501
+ "train_batch_size": 4,
502
+ "trial_name": null,
503
+ "trial_params": null
504
+ }
checkpoint-2200/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a43dfd650d40278c5424b1ba5b0067a9c7fba4f97e85ccc8fcc46c3360e49acd
3
+ size 5240
checkpoint-2200/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-2400/added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "<mask>": 64000
3
+ }
checkpoint-2400/bpe.codes ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-2400/config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "vinai/phobert-base",
3
+ "architectures": [
4
+ "RobertaForQuestionAnswering"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "gradient_checkpointing": false,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-05,
17
+ "max_position_embeddings": 258,
18
+ "model_type": "roberta",
19
+ "num_attention_heads": 12,
20
+ "num_hidden_layers": 12,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "tokenizer_class": "PhobertTokenizer",
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.44.2",
26
+ "type_vocab_size": 1,
27
+ "use_cache": true,
28
+ "vocab_size": 64001
29
+ }
checkpoint-2400/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bebdfbdf8f482d9df4152263e696a0415902a0f9b5e53cabfae528db8f65ab9c
3
+ size 537660792
checkpoint-2400/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:071d7bd19c95b93c7cba4b58c861a75913e1a15de8e29daca1628ea9aabf835c
3
+ size 1075440186
checkpoint-2400/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9379fc8d7818ceea32e0fde72dd7e513cd0638c7a951ed627e29f637bf682caf
3
+ size 14244
checkpoint-2400/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7a41f82caddd7b5a852a550750ee43ffbd480d65c06fd0e4a1ab9ebe609c746
3
+ size 1064
checkpoint-2400/special_tokens_map.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": "<mask>",
6
+ "pad_token": "<pad>",
7
+ "sep_token": "</s>",
8
+ "unk_token": "<unk>"
9
+ }
checkpoint-2400/tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "64000": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 1000000000000000019884624838656,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "PhobertTokenizer",
53
+ "unk_token": "<unk>"
54
+ }
checkpoint-2400/trainer_state.json ADDED
@@ -0,0 +1,546 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.602910749664121,
3
+ "best_model_checkpoint": "phobert-legal-qa-finetuned\\checkpoint-1800",
4
+ "epoch": 0.39804295546894436,
5
+ "eval_steps": 200,
6
+ "global_step": 2400,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.008292561572269675,
13
+ "grad_norm": 8.954318046569824,
14
+ "learning_rate": 5.417357656163627e-07,
15
+ "loss": 5.4244,
16
+ "step": 50
17
+ },
18
+ {
19
+ "epoch": 0.01658512314453935,
20
+ "grad_norm": 6.557638168334961,
21
+ "learning_rate": 1.0834715312327253e-06,
22
+ "loss": 5.1885,
23
+ "step": 100
24
+ },
25
+ {
26
+ "epoch": 0.024877684716809022,
27
+ "grad_norm": 6.624057769775391,
28
+ "learning_rate": 1.6141514648977336e-06,
29
+ "loss": 4.6824,
30
+ "step": 150
31
+ },
32
+ {
33
+ "epoch": 0.0331702462890787,
34
+ "grad_norm": 6.447659015655518,
35
+ "learning_rate": 2.1669430624654506e-06,
36
+ "loss": 3.9743,
37
+ "step": 200
38
+ },
39
+ {
40
+ "epoch": 0.0331702462890787,
41
+ "eval_accuracy": 0.5348311306901615,
42
+ "eval_end_accuracy": 0.29080763582966224,
43
+ "eval_end_f1": 0.20537819782547337,
44
+ "eval_f1": 0.10898462727065836,
45
+ "eval_loss": 3.2860655784606934,
46
+ "eval_runtime": 127.3249,
47
+ "eval_samples_per_second": 133.713,
48
+ "eval_start_accuracy": 0.7788546255506608,
49
+ "eval_start_f1": 0.012591056715843358,
50
+ "eval_steps_per_second": 16.721,
51
+ "step": 200
52
+ },
53
+ {
54
+ "epoch": 0.04146280786134837,
55
+ "grad_norm": 5.946743965148926,
56
+ "learning_rate": 2.7197346600331676e-06,
57
+ "loss": 3.0621,
58
+ "step": 250
59
+ },
60
+ {
61
+ "epoch": 0.049755369433618045,
62
+ "grad_norm": 7.603392601013184,
63
+ "learning_rate": 3.272526257600885e-06,
64
+ "loss": 1.9122,
65
+ "step": 300
66
+ },
67
+ {
68
+ "epoch": 0.05804793100588772,
69
+ "grad_norm": 4.718493461608887,
70
+ "learning_rate": 3.825317855168602e-06,
71
+ "loss": 0.8663,
72
+ "step": 350
73
+ },
74
+ {
75
+ "epoch": 0.0663404925781574,
76
+ "grad_norm": 1.8805421590805054,
77
+ "learning_rate": 4.367053620784965e-06,
78
+ "loss": 0.4477,
79
+ "step": 400
80
+ },
81
+ {
82
+ "epoch": 0.0663404925781574,
83
+ "eval_accuracy": 0.9048458149779737,
84
+ "eval_end_accuracy": 0.9175917767988253,
85
+ "eval_end_f1": 0.8303333971515077,
86
+ "eval_f1": 0.424233749861785,
87
+ "eval_loss": 0.23289425671100616,
88
+ "eval_runtime": 129.7527,
89
+ "eval_samples_per_second": 131.211,
90
+ "eval_start_accuracy": 0.8920998531571219,
91
+ "eval_start_f1": 0.018134102572062404,
92
+ "eval_steps_per_second": 16.408,
93
+ "step": 400
94
+ },
95
+ {
96
+ "epoch": 0.07463305415042706,
97
+ "grad_norm": 1.6680586338043213,
98
+ "learning_rate": 4.919845218352681e-06,
99
+ "loss": 0.2926,
100
+ "step": 450
101
+ },
102
+ {
103
+ "epoch": 0.08292561572269674,
104
+ "grad_norm": 1.3913531303405762,
105
+ "learning_rate": 5.472636815920398e-06,
106
+ "loss": 0.2317,
107
+ "step": 500
108
+ },
109
+ {
110
+ "epoch": 0.09121817729496641,
111
+ "grad_norm": 0.8732656836509705,
112
+ "learning_rate": 6.025428413488116e-06,
113
+ "loss": 0.2013,
114
+ "step": 550
115
+ },
116
+ {
117
+ "epoch": 0.09951073886723609,
118
+ "grad_norm": 0.4801824986934662,
119
+ "learning_rate": 6.578220011055833e-06,
120
+ "loss": 0.1501,
121
+ "step": 600
122
+ },
123
+ {
124
+ "epoch": 0.09951073886723609,
125
+ "eval_accuracy": 0.9687224669603525,
126
+ "eval_end_accuracy": 0.9620558002936858,
127
+ "eval_end_f1": 0.9029114827270522,
128
+ "eval_f1": 0.47707631709814124,
129
+ "eval_loss": 0.10375536233186722,
130
+ "eval_runtime": 115.3633,
131
+ "eval_samples_per_second": 147.577,
132
+ "eval_start_accuracy": 0.9753891336270191,
133
+ "eval_start_f1": 0.051241151469230285,
134
+ "eval_steps_per_second": 18.455,
135
+ "step": 600
136
+ },
137
+ {
138
+ "epoch": 0.10780330043950577,
139
+ "grad_norm": 3.085040807723999,
140
+ "learning_rate": 7.131011608623549e-06,
141
+ "loss": 0.147,
142
+ "step": 650
143
+ },
144
+ {
145
+ "epoch": 0.11609586201177544,
146
+ "grad_norm": 2.0271573066711426,
147
+ "learning_rate": 7.672747374239912e-06,
148
+ "loss": 0.1356,
149
+ "step": 700
150
+ },
151
+ {
152
+ "epoch": 0.12438842358404512,
153
+ "grad_norm": 0.833656370639801,
154
+ "learning_rate": 8.22553897180763e-06,
155
+ "loss": 0.1604,
156
+ "step": 750
157
+ },
158
+ {
159
+ "epoch": 0.1326809851563148,
160
+ "grad_norm": 4.890761375427246,
161
+ "learning_rate": 8.778330569375346e-06,
162
+ "loss": 0.2731,
163
+ "step": 800
164
+ },
165
+ {
166
+ "epoch": 0.1326809851563148,
167
+ "eval_accuracy": 0.9534801762114538,
168
+ "eval_end_accuracy": 0.9742143906020558,
169
+ "eval_end_f1": 0.9321589248479495,
170
+ "eval_f1": 0.49608082825261623,
171
+ "eval_loss": 0.1122935563325882,
172
+ "eval_runtime": 133.8336,
173
+ "eval_samples_per_second": 127.21,
174
+ "eval_start_accuracy": 0.9327459618208517,
175
+ "eval_start_f1": 0.06000273165728297,
176
+ "eval_steps_per_second": 15.908,
177
+ "step": 800
178
+ },
179
+ {
180
+ "epoch": 0.14097354672858445,
181
+ "grad_norm": 1.1853266954421997,
182
+ "learning_rate": 9.331122166943063e-06,
183
+ "loss": 0.1276,
184
+ "step": 850
185
+ },
186
+ {
187
+ "epoch": 0.14926610830085413,
188
+ "grad_norm": 7.255343437194824,
189
+ "learning_rate": 9.88391376451078e-06,
190
+ "loss": 0.0954,
191
+ "step": 900
192
+ },
193
+ {
194
+ "epoch": 0.1575586698731238,
195
+ "grad_norm": 0.5000291466712952,
196
+ "learning_rate": 1.0436705362078497e-05,
197
+ "loss": 0.1081,
198
+ "step": 950
199
+ },
200
+ {
201
+ "epoch": 0.1658512314453935,
202
+ "grad_norm": 0.3002016544342041,
203
+ "learning_rate": 1.0989496959646216e-05,
204
+ "loss": 0.1257,
205
+ "step": 1000
206
+ },
207
+ {
208
+ "epoch": 0.1658512314453935,
209
+ "eval_accuracy": 0.9772980910425844,
210
+ "eval_end_accuracy": 0.9769750367107195,
211
+ "eval_end_f1": 0.9421694663520186,
212
+ "eval_f1": 0.5088194880993329,
213
+ "eval_loss": 0.08298086374998093,
214
+ "eval_runtime": 127.1174,
215
+ "eval_samples_per_second": 133.931,
216
+ "eval_start_accuracy": 0.9776211453744493,
217
+ "eval_start_f1": 0.07546950984664719,
218
+ "eval_steps_per_second": 16.748,
219
+ "step": 1000
220
+ },
221
+ {
222
+ "epoch": 0.17414379301766317,
223
+ "grad_norm": 1.4803558588027954,
224
+ "learning_rate": 1.1542288557213931e-05,
225
+ "loss": 0.0988,
226
+ "step": 1050
227
+ },
228
+ {
229
+ "epoch": 0.18243635458993282,
230
+ "grad_norm": 5.256414413452148,
231
+ "learning_rate": 1.2095080154781648e-05,
232
+ "loss": 0.1159,
233
+ "step": 1100
234
+ },
235
+ {
236
+ "epoch": 0.1907289161622025,
237
+ "grad_norm": 2.2532193660736084,
238
+ "learning_rate": 1.2647871752349365e-05,
239
+ "loss": 0.1119,
240
+ "step": 1150
241
+ },
242
+ {
243
+ "epoch": 0.19902147773447218,
244
+ "grad_norm": 1.1581368446350098,
245
+ "learning_rate": 1.3200663349917082e-05,
246
+ "loss": 0.0801,
247
+ "step": 1200
248
+ },
249
+ {
250
+ "epoch": 0.19902147773447218,
251
+ "eval_accuracy": 0.9774743024963289,
252
+ "eval_end_accuracy": 0.9773274596182085,
253
+ "eval_end_f1": 0.9522660725045008,
254
+ "eval_f1": 0.5138705881207118,
255
+ "eval_loss": 0.06253915280103683,
256
+ "eval_runtime": 129.6143,
257
+ "eval_samples_per_second": 131.351,
258
+ "eval_start_accuracy": 0.9776211453744493,
259
+ "eval_start_f1": 0.0754751037369228,
260
+ "eval_steps_per_second": 16.426,
261
+ "step": 1200
262
+ },
263
+ {
264
+ "epoch": 0.20731403930674186,
265
+ "grad_norm": 3.3523240089416504,
266
+ "learning_rate": 1.3742399115533445e-05,
267
+ "loss": 0.1058,
268
+ "step": 1250
269
+ },
270
+ {
271
+ "epoch": 0.21560660087901154,
272
+ "grad_norm": 0.09368986636400223,
273
+ "learning_rate": 1.429519071310116e-05,
274
+ "loss": 0.1002,
275
+ "step": 1300
276
+ },
277
+ {
278
+ "epoch": 0.2238991624512812,
279
+ "grad_norm": 2.6880834102630615,
280
+ "learning_rate": 1.4847982310668878e-05,
281
+ "loss": 0.0563,
282
+ "step": 1350
283
+ },
284
+ {
285
+ "epoch": 0.23219172402355087,
286
+ "grad_norm": 10.309085845947266,
287
+ "learning_rate": 1.5400773908236596e-05,
288
+ "loss": 0.0973,
289
+ "step": 1400
290
+ },
291
+ {
292
+ "epoch": 0.23219172402355087,
293
+ "eval_accuracy": 0.9787371512481644,
294
+ "eval_end_accuracy": 0.978208516886931,
295
+ "eval_end_f1": 0.9484799222981063,
296
+ "eval_f1": 0.5438896216202269,
297
+ "eval_loss": 0.0639370009303093,
298
+ "eval_runtime": 128.1258,
299
+ "eval_samples_per_second": 132.877,
300
+ "eval_start_accuracy": 0.9792657856093979,
301
+ "eval_start_f1": 0.13929932094234762,
302
+ "eval_steps_per_second": 16.616,
303
+ "step": 1400
304
+ },
305
+ {
306
+ "epoch": 0.24048428559582055,
307
+ "grad_norm": 0.07747649401426315,
308
+ "learning_rate": 1.5953565505804315e-05,
309
+ "loss": 0.0724,
310
+ "step": 1450
311
+ },
312
+ {
313
+ "epoch": 0.24877684716809023,
314
+ "grad_norm": 0.076473668217659,
315
+ "learning_rate": 1.650635710337203e-05,
316
+ "loss": 0.0928,
317
+ "step": 1500
318
+ },
319
+ {
320
+ "epoch": 0.2570694087403599,
321
+ "grad_norm": 0.18258516490459442,
322
+ "learning_rate": 1.7059148700939746e-05,
323
+ "loss": 0.085,
324
+ "step": 1550
325
+ },
326
+ {
327
+ "epoch": 0.2653619703126296,
328
+ "grad_norm": 2.0234451293945312,
329
+ "learning_rate": 1.7611940298507464e-05,
330
+ "loss": 0.1107,
331
+ "step": 1600
332
+ },
333
+ {
334
+ "epoch": 0.2653619703126296,
335
+ "eval_accuracy": 0.9774449339207049,
336
+ "eval_end_accuracy": 0.9773274596182085,
337
+ "eval_end_f1": 0.9548822846052203,
338
+ "eval_f1": 0.5144663543275407,
339
+ "eval_loss": 0.06893135607242584,
340
+ "eval_runtime": 137.3173,
341
+ "eval_samples_per_second": 123.983,
342
+ "eval_start_accuracy": 0.9775624082232012,
343
+ "eval_start_f1": 0.07405042404986113,
344
+ "eval_steps_per_second": 15.504,
345
+ "step": 1600
346
+ },
347
+ {
348
+ "epoch": 0.27365453188489924,
349
+ "grad_norm": 2.9243485927581787,
350
+ "learning_rate": 1.816473189607518e-05,
351
+ "loss": 0.069,
352
+ "step": 1650
353
+ },
354
+ {
355
+ "epoch": 0.2819470934571689,
356
+ "grad_norm": 0.05162263661623001,
357
+ "learning_rate": 1.87175234936429e-05,
358
+ "loss": 0.0815,
359
+ "step": 1700
360
+ },
361
+ {
362
+ "epoch": 0.2902396550294386,
363
+ "grad_norm": 0.12183202058076859,
364
+ "learning_rate": 1.9270315091210617e-05,
365
+ "loss": 0.0741,
366
+ "step": 1750
367
+ },
368
+ {
369
+ "epoch": 0.29853221660170826,
370
+ "grad_norm": 3.247403621673584,
371
+ "learning_rate": 1.9823106688778332e-05,
372
+ "loss": 0.0819,
373
+ "step": 1800
374
+ },
375
+ {
376
+ "epoch": 0.29853221660170826,
377
+ "eval_accuracy": 0.9795007342143907,
378
+ "eval_end_accuracy": 0.9793832599118942,
379
+ "eval_end_f1": 0.9547976694500852,
380
+ "eval_f1": 0.602910749664121,
381
+ "eval_loss": 0.07215487957000732,
382
+ "eval_runtime": 135.7664,
383
+ "eval_samples_per_second": 125.399,
384
+ "eval_start_accuracy": 0.979618208516887,
385
+ "eval_start_f1": 0.25102382987815675,
386
+ "eval_steps_per_second": 15.681,
387
+ "step": 1800
388
+ },
389
+ {
390
+ "epoch": 0.30682477817397796,
391
+ "grad_norm": 0.042116910219192505,
392
+ "learning_rate": 1.9958225826268585e-05,
393
+ "loss": 0.0641,
394
+ "step": 1850
395
+ },
396
+ {
397
+ "epoch": 0.3151173397462476,
398
+ "grad_norm": 0.36534908413887024,
399
+ "learning_rate": 1.989679321784003e-05,
400
+ "loss": 0.1036,
401
+ "step": 1900
402
+ },
403
+ {
404
+ "epoch": 0.32340990131851727,
405
+ "grad_norm": 0.2260214239358902,
406
+ "learning_rate": 1.9835360609411478e-05,
407
+ "loss": 0.0751,
408
+ "step": 1950
409
+ },
410
+ {
411
+ "epoch": 0.331702462890787,
412
+ "grad_norm": 0.03621504455804825,
413
+ "learning_rate": 1.977392800098292e-05,
414
+ "loss": 0.0632,
415
+ "step": 2000
416
+ },
417
+ {
418
+ "epoch": 0.331702462890787,
419
+ "eval_accuracy": 0.9779148311306902,
420
+ "eval_end_accuracy": 0.9779735682819384,
421
+ "eval_end_f1": 0.9323578270443484,
422
+ "eval_f1": 0.5095598212540562,
423
+ "eval_loss": 0.058555059134960175,
424
+ "eval_runtime": 136.5016,
425
+ "eval_samples_per_second": 124.724,
426
+ "eval_start_accuracy": 0.977856093979442,
427
+ "eval_start_f1": 0.08676181546376394,
428
+ "eval_steps_per_second": 15.597,
429
+ "step": 2000
430
+ },
431
+ {
432
+ "epoch": 0.3399950244630566,
433
+ "grad_norm": 0.4064314365386963,
434
+ "learning_rate": 1.9712495392554368e-05,
435
+ "loss": 0.0833,
436
+ "step": 2050
437
+ },
438
+ {
439
+ "epoch": 0.34828758603532634,
440
+ "grad_norm": 1.703637957572937,
441
+ "learning_rate": 1.9651062784125818e-05,
442
+ "loss": 0.1415,
443
+ "step": 2100
444
+ },
445
+ {
446
+ "epoch": 0.356580147607596,
447
+ "grad_norm": 0.7372889518737793,
448
+ "learning_rate": 1.958963017569726e-05,
449
+ "loss": 0.2726,
450
+ "step": 2150
451
+ },
452
+ {
453
+ "epoch": 0.36487270917986564,
454
+ "grad_norm": 0.28392699360847473,
455
+ "learning_rate": 1.9528197567268707e-05,
456
+ "loss": 0.137,
457
+ "step": 2200
458
+ },
459
+ {
460
+ "epoch": 0.36487270917986564,
461
+ "eval_accuracy": 0.9787958883994126,
462
+ "eval_end_accuracy": 0.9785609397944199,
463
+ "eval_end_f1": 0.9504899506887509,
464
+ "eval_f1": 0.5319231158110048,
465
+ "eval_loss": 0.05977020785212517,
466
+ "eval_runtime": 136.9823,
467
+ "eval_samples_per_second": 124.286,
468
+ "eval_start_accuracy": 0.9790308370044053,
469
+ "eval_start_f1": 0.11335628093325875,
470
+ "eval_steps_per_second": 15.542,
471
+ "step": 2200
472
+ },
473
+ {
474
+ "epoch": 0.37316527075213535,
475
+ "grad_norm": 1.4994114637374878,
476
+ "learning_rate": 1.9466764958840154e-05,
477
+ "loss": 0.1661,
478
+ "step": 2250
479
+ },
480
+ {
481
+ "epoch": 0.381457832324405,
482
+ "grad_norm": 0.035565998405218124,
483
+ "learning_rate": 1.94053323504116e-05,
484
+ "loss": 0.1175,
485
+ "step": 2300
486
+ },
487
+ {
488
+ "epoch": 0.3897503938966747,
489
+ "grad_norm": 0.5697016716003418,
490
+ "learning_rate": 1.9343899741983044e-05,
491
+ "loss": 0.0934,
492
+ "step": 2350
493
+ },
494
+ {
495
+ "epoch": 0.39804295546894436,
496
+ "grad_norm": 5.968558311462402,
497
+ "learning_rate": 1.9282467133554494e-05,
498
+ "loss": 0.0968,
499
+ "step": 2400
500
+ },
501
+ {
502
+ "epoch": 0.39804295546894436,
503
+ "eval_accuracy": 0.9804698972099853,
504
+ "eval_end_accuracy": 0.9802643171806168,
505
+ "eval_end_f1": 0.9588187076188823,
506
+ "eval_f1": 0.5888859332600402,
507
+ "eval_loss": 0.06477497518062592,
508
+ "eval_runtime": 140.1122,
509
+ "eval_samples_per_second": 121.51,
510
+ "eval_start_accuracy": 0.9806754772393539,
511
+ "eval_start_f1": 0.21895315890119804,
512
+ "eval_steps_per_second": 15.195,
513
+ "step": 2400
514
+ }
515
+ ],
516
+ "logging_steps": 50,
517
+ "max_steps": 18087,
518
+ "num_input_tokens_seen": 0,
519
+ "num_train_epochs": 3,
520
+ "save_steps": 200,
521
+ "stateful_callbacks": {
522
+ "EarlyStoppingCallback": {
523
+ "args": {
524
+ "early_stopping_patience": 3,
525
+ "early_stopping_threshold": 0.001
526
+ },
527
+ "attributes": {
528
+ "early_stopping_patience_counter": 0
529
+ }
530
+ },
531
+ "TrainerControl": {
532
+ "args": {
533
+ "should_epoch_stop": false,
534
+ "should_evaluate": false,
535
+ "should_log": false,
536
+ "should_save": true,
537
+ "should_training_stop": true
538
+ },
539
+ "attributes": {}
540
+ }
541
+ },
542
+ "total_flos": 5016897729331200.0,
543
+ "train_batch_size": 4,
544
+ "trial_name": null,
545
+ "trial_params": null
546
+ }
checkpoint-2400/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a43dfd650d40278c5424b1ba5b0067a9c7fba4f97e85ccc8fcc46c3360e49acd
3
+ size 5240
checkpoint-2400/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "vinai/phobert-base",
3
+ "architectures": [
4
+ "RobertaForQuestionAnswering"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "gradient_checkpointing": false,
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.1,
13
+ "hidden_size": 768,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "layer_norm_eps": 1e-05,
17
+ "max_position_embeddings": 258,
18
+ "model_type": "roberta",
19
+ "num_attention_heads": 12,
20
+ "num_hidden_layers": 12,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "tokenizer_class": "PhobertTokenizer",
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.44.2",
26
+ "type_vocab_size": 1,
27
+ "use_cache": true,
28
+ "vocab_size": 64001
29
+ }
eval_metrics.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eval_loss": 0.07215487957000732,
3
+ "eval_accuracy": 0.9795007342143907,
4
+ "eval_f1": 0.602910749664121,
5
+ "eval_start_accuracy": 0.979618208516887,
6
+ "eval_end_accuracy": 0.9793832599118942,
7
+ "eval_start_f1": 0.25102382987815675,
8
+ "eval_end_f1": 0.9547976694500852,
9
+ "eval_runtime": 132.7064,
10
+ "eval_samples_per_second": 128.291,
11
+ "eval_steps_per_second": 16.043,
12
+ "epoch": 0.39804295546894436
13
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:389e7b032149cdbe860f7307cb2dcd4781aef2ac7567a748a409a876678315f4
3
+ size 537660792
special_tokens_map.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": "<mask>",
6
+ "pad_token": "<pad>",
7
+ "sep_token": "</s>",
8
+ "unk_token": "<unk>"
9
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "64000": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 1000000000000000019884624838656,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "PhobertTokenizer",
53
+ "unk_token": "<unk>"
54
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a43dfd650d40278c5424b1ba5b0067a9c7fba4f97e85ccc8fcc46c3360e49acd
3
+ size 5240
training_info.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "eval_result": {
3
+ "eval_loss": 0.07215487957000732,
4
+ "eval_accuracy": 0.9795007342143907,
5
+ "eval_f1": 0.602910749664121,
6
+ "eval_start_accuracy": 0.979618208516887,
7
+ "eval_end_accuracy": 0.9793832599118942,
8
+ "eval_start_f1": 0.25102382987815675,
9
+ "eval_end_f1": 0.9547976694500852,
10
+ "eval_runtime": 132.7064,
11
+ "eval_samples_per_second": 128.291,
12
+ "eval_steps_per_second": 16.043,
13
+ "epoch": 0.39804295546894436
14
+ },
15
+ "train_result": {
16
+ "training_loss": 0.6344684727986654
17
+ },
18
+ "dataset_info": {
19
+ "total_qa_pairs": 156349,
20
+ "train_size": 96472,
21
+ "validation_size": 17025,
22
+ "categories": [
23
+ "Công nghiệp",
24
+ "Thuế, phí, lệ phí, các khoản thu khác",
25
+ "Đất đai",
26
+ "Dân số, gia đình, trẻ em, bình đẳng giới",
27
+ "Quốc phòng",
28
+ "Hành chính tư pháp",
29
+ "Tài nguyên",
30
+ "Văn hóa, thể thao, du lịch",
31
+ "Giao thông, vận tải",
32
+ "Thông tin, báo chí, xuất bản",
33
+ "Tổ chức chính trị - xã hội, hội",
34
+ "Y tế, dược",
35
+ "Dân tộc",
36
+ "Thống kê",
37
+ "Khoa học, công nghệ",
38
+ "An ninh quốc gia",
39
+ "Tổ chức bộ máy nhà nước",
40
+ "Ngoại giao, điều ước quốc tế",
41
+ "Bổ trợ tư pháp",
42
+ "Tài sản công, nợ công, dự trữ nhà nước",
43
+ "Tố tụng và các phương thức giải quyết tranh chấp",
44
+ "Doanh nghiệp, hợp tác xã",
45
+ "Trật tự, an toàn xã hội"
46
+ ]
47
+ },
48
+ "transformers_version": "4.44.2"
49
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff