kiarashmo commited on
Commit
0fd0986
·
verified ·
1 Parent(s): 58be295

chembberta-77m-sBERT-finetuned-on-clintox-with-sBERTBatchAllTripletLoss

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,444 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:2184
9
+ - loss:BatchAllTripletLoss
10
+ base_model: kiarashmo/chembberta-77m-mlm-safetensors
11
+ widget:
12
+ - source_sentence: CC(C)CNCc1ccc(-c2ccccc2S(=O)(=O)N2CCCC2)cc1
13
+ sentences:
14
+ - C=C1CC2CCC34CC5OC6C(OC7CCC(CC(=O)CC8C(CC9OC(CCC1O2)CC(C)C9=C)OC(CC(O)CN)C8OC)OC7C6O3)C5O4
15
+ - C[NH+](C)CCC=C1c2ccccc2CCc2ccccc21
16
+ - CC(C)Cn1cnc2c(N)nc3ccccc3c21
17
+ - source_sentence: COC(=O)NC(C(=O)NC(Cc1ccccc1)C(O)CN(Cc1ccc(-c2ccccn2)cc1)NC(=O)C(NC(=O)OC)C(C)(C)C)C(C)(C)C
18
+ sentences:
19
+ - C=C1CC2CCC34CC5OC6C(OC7CCC(CC(=O)CC8C(CC9OC(CCC1O2)CC(C)C9=C)OC(CC(O)CN)C8OC)OC7C6O3)C5O4
20
+ - C[NH+]1CCCC(CC2c3ccccc3Sc3ccccc32)C1
21
+ - C[NH2+]C1(c2ccccc2Cl)CCCCC1=O
22
+ - source_sentence: C[NH+]1CC(C(=O)NC2(C)OC3(O)C4CCCN4C(=O)C(Cc4ccccc4)N3C2=O)CC2c3cccc4[nH]cc(c34)CC21
23
+ sentences:
24
+ - C[NH+](C)CCC=C1c2ccccc2COc2ccc(CC(=O)[O-])cc21
25
+ - C[NH+]1CCC(=C2c3ccccc3CCn3c(C=O)c[nH+]c32)CC1
26
+ - COC(=O)NC(C(=O)NC(Cc1ccccc1)C(O)CN(Cc1ccc(-c2ccccn2)cc1)NC(=O)C(NC(=O)OC)C(C)(C)C)C(C)(C)C
27
+ - source_sentence: C[NH2+]CCCC12CCC(c3ccccc31)c1ccccc12
28
+ sentences:
29
+ - C[N+]1(C)CCC(=C(c2ccccc2)c2ccccc2)CC1
30
+ - CC(CN1CC(=O)NC(=O)C1)[NH+]1CC(=O)NC(=O)C1
31
+ - C[NH+](C)CCc1c[nH]c2ccc(CC3COC(=O)N3)cc12
32
+ - source_sentence: CC(C)CNCc1ccc(-c2ccccc2S(=O)(=O)N2CCCC2)cc1
33
+ sentences:
34
+ - COC(=O)NC(C(=O)NC(Cc1ccccc1)C(O)CN(Cc1ccc(-c2ccccn2)cc1)NC(=O)C(NC(=O)OC)C(C)(C)C)C(C)(C)C
35
+ - COc1ccc(C(=O)CC(=O)c2ccc(C(C)(C)C)cc2)cc1
36
+ - COC1CC(OC2C(C)C(=O)OC(C)C(C)C(OC(C)=O)C(C)C(=O)C3(CO3)CC(C)C(OC3OC(C)CC([NH+](C)C)C3OC(C)=O)C2C)OC(C)C1OC(C)=O
37
+ pipeline_tag: sentence-similarity
38
+ library_name: sentence-transformers
39
+ metrics:
40
+ - cosine_accuracy
41
+ - cosine_accuracy_threshold
42
+ - cosine_f1
43
+ - cosine_f1_threshold
44
+ - cosine_precision
45
+ - cosine_recall
46
+ - cosine_ap
47
+ - cosine_mcc
48
+ model-index:
49
+ - name: SentenceTransformer based on kiarashmo/chembberta-77m-mlm-safetensors
50
+ results:
51
+ - task:
52
+ type: binary-classification
53
+ name: Binary Classification
54
+ dataset:
55
+ name: val sim
56
+ type: val-sim
57
+ metrics:
58
+ - type: cosine_accuracy
59
+ value: 0.671
60
+ name: Cosine Accuracy
61
+ - type: cosine_accuracy_threshold
62
+ value: 0.8630315065383911
63
+ name: Cosine Accuracy Threshold
64
+ - type: cosine_f1
65
+ value: 0.7042889390519187
66
+ name: Cosine F1
67
+ - type: cosine_f1_threshold
68
+ value: -0.3091595470905304
69
+ name: Cosine F1 Threshold
70
+ - type: cosine_precision
71
+ value: 0.5763546798029556
72
+ name: Cosine Precision
73
+ - type: cosine_recall
74
+ value: 0.9052224371373307
75
+ name: Cosine Recall
76
+ - type: cosine_ap
77
+ value: 0.7370675686338501
78
+ name: Cosine Ap
79
+ - type: cosine_mcc
80
+ value: 0.24685118679448836
81
+ name: Cosine Mcc
82
+ ---
83
+
84
+ # SentenceTransformer based on kiarashmo/chembberta-77m-mlm-safetensors
85
+
86
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [kiarashmo/chembberta-77m-mlm-safetensors](https://huggingface.co/kiarashmo/chembberta-77m-mlm-safetensors). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
87
+
88
+ ## Model Details
89
+
90
+ ### Model Description
91
+ - **Model Type:** Sentence Transformer
92
+ - **Base model:** [kiarashmo/chembberta-77m-mlm-safetensors](https://huggingface.co/kiarashmo/chembberta-77m-mlm-safetensors) <!-- at revision 9d0b79d268438177519adce1e36395ea0ae363e9 -->
93
+ - **Maximum Sequence Length:** 512 tokens
94
+ - **Output Dimensionality:** 384 dimensions
95
+ - **Similarity Function:** Cosine Similarity
96
+ <!-- - **Training Dataset:** Unknown -->
97
+ <!-- - **Language:** Unknown -->
98
+ <!-- - **License:** Unknown -->
99
+
100
+ ### Model Sources
101
+
102
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
103
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
104
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
105
+
106
+ ### Full Model Architecture
107
+
108
+ ```
109
+ SentenceTransformer(
110
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
111
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
112
+ )
113
+ ```
114
+
115
+ ## Usage
116
+
117
+ ### Direct Usage (Sentence Transformers)
118
+
119
+ First install the Sentence Transformers library:
120
+
121
+ ```bash
122
+ pip install -U sentence-transformers
123
+ ```
124
+
125
+ Then you can load this model and run inference.
126
+ ```python
127
+ from sentence_transformers import SentenceTransformer
128
+
129
+ # Download from the 🤗 Hub
130
+ model = SentenceTransformer("sentence_transformers_model_id")
131
+ # Run inference
132
+ sentences = [
133
+ 'CC(C)CNCc1ccc(-c2ccccc2S(=O)(=O)N2CCCC2)cc1',
134
+ 'COC(=O)NC(C(=O)NC(Cc1ccccc1)C(O)CN(Cc1ccc(-c2ccccn2)cc1)NC(=O)C(NC(=O)OC)C(C)(C)C)C(C)(C)C',
135
+ 'COc1ccc(C(=O)CC(=O)c2ccc(C(C)(C)C)cc2)cc1',
136
+ ]
137
+ embeddings = model.encode(sentences)
138
+ print(embeddings.shape)
139
+ # [3, 384]
140
+
141
+ # Get the similarity scores for the embeddings
142
+ similarities = model.similarity(embeddings, embeddings)
143
+ print(similarities)
144
+ # tensor([[ 1.0000, 0.8293, -0.3326],
145
+ # [ 0.8293, 1.0000, -0.0993],
146
+ # [-0.3326, -0.0993, 1.0000]])
147
+ ```
148
+
149
+ <!--
150
+ ### Direct Usage (Transformers)
151
+
152
+ <details><summary>Click to see the direct usage in Transformers</summary>
153
+
154
+ </details>
155
+ -->
156
+
157
+ <!--
158
+ ### Downstream Usage (Sentence Transformers)
159
+
160
+ You can finetune this model on your own dataset.
161
+
162
+ <details><summary>Click to expand</summary>
163
+
164
+ </details>
165
+ -->
166
+
167
+ <!--
168
+ ### Out-of-Scope Use
169
+
170
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
171
+ -->
172
+
173
+ ## Evaluation
174
+
175
+ ### Metrics
176
+
177
+ #### Binary Classification
178
+
179
+ * Dataset: `val-sim`
180
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
181
+
182
+ | Metric | Value |
183
+ |:--------------------------|:-----------|
184
+ | cosine_accuracy | 0.671 |
185
+ | cosine_accuracy_threshold | 0.863 |
186
+ | cosine_f1 | 0.7043 |
187
+ | cosine_f1_threshold | -0.3092 |
188
+ | cosine_precision | 0.5764 |
189
+ | cosine_recall | 0.9052 |
190
+ | **cosine_ap** | **0.7371** |
191
+ | cosine_mcc | 0.2469 |
192
+
193
+ <!--
194
+ ## Bias, Risks and Limitations
195
+
196
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
197
+ -->
198
+
199
+ <!--
200
+ ### Recommendations
201
+
202
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
203
+ -->
204
+
205
+ ## Training Details
206
+
207
+ ### Training Dataset
208
+
209
+ #### Unnamed Dataset
210
+
211
+ * Size: 2,184 training samples
212
+ * Columns: <code>text</code> and <code>label</code>
213
+ * Approximate statistics based on the first 1000 samples:
214
+ | | text | label |
215
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------|
216
+ | type | string | int |
217
+ | details | <ul><li>min: 3 tokens</li><li>mean: 43.69 tokens</li><li>max: 221 tokens</li></ul> | <ul><li>0: ~92.10%</li><li>1: ~7.90%</li></ul> |
218
+ * Samples:
219
+ | text | label |
220
+ |:------------------------------------------------------------------------|:---------------|
221
+ | <code>CC(C)CC(NC(=O)CNC(=O)c1cc(Cl)ccc1Cl)B(O)O</code> | <code>1</code> |
222
+ | <code>O=C(NCC(O)CO)c1c(I)c(C(=O)NCC(O)CO)c(I)c(N(CCO)C(=O)CO)c1I</code> | <code>0</code> |
223
+ | <code>Clc1cc(Cl)c(OCC#CI)cc1Cl</code> | <code>0</code> |
224
+ * Loss: [<code>BatchAllTripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchalltripletloss)
225
+
226
+ ### Evaluation Dataset
227
+
228
+ #### Unnamed Dataset
229
+
230
+ * Size: 282 evaluation samples
231
+ * Columns: <code>text</code> and <code>label</code>
232
+ * Approximate statistics based on the first 282 samples:
233
+ | | text | label |
234
+ |:--------|:------------------------------------------------------------------------------------|:------------------------------------------------|
235
+ | type | string | int |
236
+ | details | <ul><li>min: 18 tokens</li><li>mean: 65.88 tokens</li><li>max: 244 tokens</li></ul> | <ul><li>0: ~50.00%</li><li>1: ~50.00%</li></ul> |
237
+ * Samples:
238
+ | text | label |
239
+ |:-------------------------------------------------------------------------------|:---------------|
240
+ | <code>CC(C)CNCc1ccc(-c2ccccc2S(=O)(=O)N2CCCC2)cc1</code> | <code>1</code> |
241
+ | <code>CC(C)Cn1cnc2c(N)nc3ccccc3c21</code> | <code>0</code> |
242
+ | <code>CC(C)CN(CC(O)C(Cc1ccccc1)NC(=O)OC1COC2OCCC12)S(=O)(=O)c1ccc(N)cc1</code> | <code>0</code> |
243
+ * Loss: [<code>BatchAllTripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchalltripletloss)
244
+
245
+ ### Training Hyperparameters
246
+ #### Non-Default Hyperparameters
247
+
248
+ - `eval_strategy`: steps
249
+ - `per_device_train_batch_size`: 32
250
+ - `per_device_eval_batch_size`: 32
251
+ - `num_train_epochs`: 100
252
+ - `warmup_steps`: 100
253
+ - `load_best_model_at_end`: True
254
+
255
+ #### All Hyperparameters
256
+ <details><summary>Click to expand</summary>
257
+
258
+ - `overwrite_output_dir`: False
259
+ - `do_predict`: False
260
+ - `eval_strategy`: steps
261
+ - `prediction_loss_only`: True
262
+ - `per_device_train_batch_size`: 32
263
+ - `per_device_eval_batch_size`: 32
264
+ - `per_gpu_train_batch_size`: None
265
+ - `per_gpu_eval_batch_size`: None
266
+ - `gradient_accumulation_steps`: 1
267
+ - `eval_accumulation_steps`: None
268
+ - `torch_empty_cache_steps`: None
269
+ - `learning_rate`: 5e-05
270
+ - `weight_decay`: 0.0
271
+ - `adam_beta1`: 0.9
272
+ - `adam_beta2`: 0.999
273
+ - `adam_epsilon`: 1e-08
274
+ - `max_grad_norm`: 1.0
275
+ - `num_train_epochs`: 100
276
+ - `max_steps`: -1
277
+ - `lr_scheduler_type`: linear
278
+ - `lr_scheduler_kwargs`: {}
279
+ - `warmup_ratio`: 0.0
280
+ - `warmup_steps`: 100
281
+ - `log_level`: passive
282
+ - `log_level_replica`: warning
283
+ - `log_on_each_node`: True
284
+ - `logging_nan_inf_filter`: True
285
+ - `save_safetensors`: True
286
+ - `save_on_each_node`: False
287
+ - `save_only_model`: False
288
+ - `restore_callback_states_from_checkpoint`: False
289
+ - `no_cuda`: False
290
+ - `use_cpu`: False
291
+ - `use_mps_device`: False
292
+ - `seed`: 42
293
+ - `data_seed`: None
294
+ - `jit_mode_eval`: False
295
+ - `use_ipex`: False
296
+ - `bf16`: False
297
+ - `fp16`: False
298
+ - `fp16_opt_level`: O1
299
+ - `half_precision_backend`: auto
300
+ - `bf16_full_eval`: False
301
+ - `fp16_full_eval`: False
302
+ - `tf32`: None
303
+ - `local_rank`: 0
304
+ - `ddp_backend`: None
305
+ - `tpu_num_cores`: None
306
+ - `tpu_metrics_debug`: False
307
+ - `debug`: []
308
+ - `dataloader_drop_last`: False
309
+ - `dataloader_num_workers`: 0
310
+ - `dataloader_prefetch_factor`: None
311
+ - `past_index`: -1
312
+ - `disable_tqdm`: False
313
+ - `remove_unused_columns`: True
314
+ - `label_names`: None
315
+ - `load_best_model_at_end`: True
316
+ - `ignore_data_skip`: False
317
+ - `fsdp`: []
318
+ - `fsdp_min_num_params`: 0
319
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
320
+ - `fsdp_transformer_layer_cls_to_wrap`: None
321
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
322
+ - `deepspeed`: None
323
+ - `label_smoothing_factor`: 0.0
324
+ - `optim`: adamw_torch
325
+ - `optim_args`: None
326
+ - `adafactor`: False
327
+ - `group_by_length`: False
328
+ - `length_column_name`: length
329
+ - `ddp_find_unused_parameters`: None
330
+ - `ddp_bucket_cap_mb`: None
331
+ - `ddp_broadcast_buffers`: False
332
+ - `dataloader_pin_memory`: True
333
+ - `dataloader_persistent_workers`: False
334
+ - `skip_memory_metrics`: True
335
+ - `use_legacy_prediction_loop`: False
336
+ - `push_to_hub`: False
337
+ - `resume_from_checkpoint`: None
338
+ - `hub_model_id`: None
339
+ - `hub_strategy`: every_save
340
+ - `hub_private_repo`: None
341
+ - `hub_always_push`: False
342
+ - `hub_revision`: None
343
+ - `gradient_checkpointing`: False
344
+ - `gradient_checkpointing_kwargs`: None
345
+ - `include_inputs_for_metrics`: False
346
+ - `include_for_metrics`: []
347
+ - `eval_do_concat_batches`: True
348
+ - `fp16_backend`: auto
349
+ - `push_to_hub_model_id`: None
350
+ - `push_to_hub_organization`: None
351
+ - `mp_parameters`:
352
+ - `auto_find_batch_size`: False
353
+ - `full_determinism`: False
354
+ - `torchdynamo`: None
355
+ - `ray_scope`: last
356
+ - `ddp_timeout`: 1800
357
+ - `torch_compile`: False
358
+ - `torch_compile_backend`: None
359
+ - `torch_compile_mode`: None
360
+ - `include_tokens_per_second`: False
361
+ - `include_num_input_tokens_seen`: False
362
+ - `neftune_noise_alpha`: None
363
+ - `optim_target_modules`: None
364
+ - `batch_eval_metrics`: False
365
+ - `eval_on_start`: False
366
+ - `use_liger_kernel`: False
367
+ - `liger_kernel_config`: None
368
+ - `eval_use_gather_object`: False
369
+ - `average_tokens_across_devices`: False
370
+ - `prompts`: None
371
+ - `batch_sampler`: batch_sampler
372
+ - `multi_dataset_batch_sampler`: proportional
373
+ - `router_mapping`: {}
374
+ - `learning_rate_mapping`: {}
375
+
376
+ </details>
377
+
378
+ ### Training Logs
379
+ | Epoch | Step | Training Loss | Validation Loss | val-sim_cosine_ap |
380
+ |:----------:|:-------:|:-------------:|:---------------:|:-----------------:|
381
+ | **7.2464** | **500** | **4.0383** | **5.2239** | **0.6972** |
382
+ | 14.4928 | 1000 | 3.5414 | 5.6988 | 0.6918 |
383
+ | 21.7391 | 1500 | 3.2672 | 5.3616 | 0.7147 |
384
+ | 28.9855 | 2000 | 2.885 | 5.7296 | 0.7240 |
385
+ | 36.2319 | 2500 | 2.7761 | 5.5717 | 0.7399 |
386
+ | 43.4783 | 3000 | 2.6489 | 5.8045 | 0.7371 |
387
+
388
+ * The bold row denotes the saved checkpoint.
389
+
390
+ ### Framework Versions
391
+ - Python: 3.9.23
392
+ - Sentence Transformers: 5.0.0
393
+ - Transformers: 4.53.3
394
+ - PyTorch: 2.5.0+cu118
395
+ - Accelerate: 1.9.0
396
+ - Datasets: 4.0.0
397
+ - Tokenizers: 0.21.2
398
+
399
+ ## Citation
400
+
401
+ ### BibTeX
402
+
403
+ #### Sentence Transformers
404
+ ```bibtex
405
+ @inproceedings{reimers-2019-sentence-bert,
406
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
407
+ author = "Reimers, Nils and Gurevych, Iryna",
408
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
409
+ month = "11",
410
+ year = "2019",
411
+ publisher = "Association for Computational Linguistics",
412
+ url = "https://arxiv.org/abs/1908.10084",
413
+ }
414
+ ```
415
+
416
+ #### BatchAllTripletLoss
417
+ ```bibtex
418
+ @misc{hermans2017defense,
419
+ title={In Defense of the Triplet Loss for Person Re-Identification},
420
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
421
+ year={2017},
422
+ eprint={1703.07737},
423
+ archivePrefix={arXiv},
424
+ primaryClass={cs.CV}
425
+ }
426
+ ```
427
+
428
+ <!--
429
+ ## Glossary
430
+
431
+ *Clearly define terms in order to be accessible across audiences.*
432
+ -->
433
+
434
+ <!--
435
+ ## Model Card Authors
436
+
437
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
438
+ -->
439
+
440
+ <!--
441
+ ## Model Card Contact
442
+
443
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
444
+ -->
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "</s>": 592,
3
+ "<s>": 591
4
+ }
checkpoint-500/1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
checkpoint-500/README.md ADDED
@@ -0,0 +1,438 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:2184
9
+ - loss:BatchAllTripletLoss
10
+ base_model: kiarashmo/chembberta-77m-mlm-safetensors
11
+ widget:
12
+ - source_sentence: CC(C)CNCc1ccc(-c2ccccc2S(=O)(=O)N2CCCC2)cc1
13
+ sentences:
14
+ - C=C1CC2CCC34CC5OC6C(OC7CCC(CC(=O)CC8C(CC9OC(CCC1O2)CC(C)C9=C)OC(CC(O)CN)C8OC)OC7C6O3)C5O4
15
+ - C[NH+](C)CCC=C1c2ccccc2CCc2ccccc21
16
+ - CC(C)Cn1cnc2c(N)nc3ccccc3c21
17
+ - source_sentence: COC(=O)NC(C(=O)NC(Cc1ccccc1)C(O)CN(Cc1ccc(-c2ccccn2)cc1)NC(=O)C(NC(=O)OC)C(C)(C)C)C(C)(C)C
18
+ sentences:
19
+ - C=C1CC2CCC34CC5OC6C(OC7CCC(CC(=O)CC8C(CC9OC(CCC1O2)CC(C)C9=C)OC(CC(O)CN)C8OC)OC7C6O3)C5O4
20
+ - C[NH+]1CCCC(CC2c3ccccc3Sc3ccccc32)C1
21
+ - C[NH2+]C1(c2ccccc2Cl)CCCCC1=O
22
+ - source_sentence: C[NH+]1CC(C(=O)NC2(C)OC3(O)C4CCCN4C(=O)C(Cc4ccccc4)N3C2=O)CC2c3cccc4[nH]cc(c34)CC21
23
+ sentences:
24
+ - C[NH+](C)CCC=C1c2ccccc2COc2ccc(CC(=O)[O-])cc21
25
+ - C[NH+]1CCC(=C2c3ccccc3CCn3c(C=O)c[nH+]c32)CC1
26
+ - COC(=O)NC(C(=O)NC(Cc1ccccc1)C(O)CN(Cc1ccc(-c2ccccn2)cc1)NC(=O)C(NC(=O)OC)C(C)(C)C)C(C)(C)C
27
+ - source_sentence: C[NH2+]CCCC12CCC(c3ccccc31)c1ccccc12
28
+ sentences:
29
+ - C[N+]1(C)CCC(=C(c2ccccc2)c2ccccc2)CC1
30
+ - CC(CN1CC(=O)NC(=O)C1)[NH+]1CC(=O)NC(=O)C1
31
+ - C[NH+](C)CCc1c[nH]c2ccc(CC3COC(=O)N3)cc12
32
+ - source_sentence: CC(C)CNCc1ccc(-c2ccccc2S(=O)(=O)N2CCCC2)cc1
33
+ sentences:
34
+ - COC(=O)NC(C(=O)NC(Cc1ccccc1)C(O)CN(Cc1ccc(-c2ccccn2)cc1)NC(=O)C(NC(=O)OC)C(C)(C)C)C(C)(C)C
35
+ - COc1ccc(C(=O)CC(=O)c2ccc(C(C)(C)C)cc2)cc1
36
+ - COC1CC(OC2C(C)C(=O)OC(C)C(C)C(OC(C)=O)C(C)C(=O)C3(CO3)CC(C)C(OC3OC(C)CC([NH+](C)C)C3OC(C)=O)C2C)OC(C)C1OC(C)=O
37
+ pipeline_tag: sentence-similarity
38
+ library_name: sentence-transformers
39
+ metrics:
40
+ - cosine_accuracy
41
+ - cosine_accuracy_threshold
42
+ - cosine_f1
43
+ - cosine_f1_threshold
44
+ - cosine_precision
45
+ - cosine_recall
46
+ - cosine_ap
47
+ - cosine_mcc
48
+ model-index:
49
+ - name: SentenceTransformer based on kiarashmo/chembberta-77m-mlm-safetensors
50
+ results:
51
+ - task:
52
+ type: binary-classification
53
+ name: Binary Classification
54
+ dataset:
55
+ name: val sim
56
+ type: val-sim
57
+ metrics:
58
+ - type: cosine_accuracy
59
+ value: 0.611
60
+ name: Cosine Accuracy
61
+ - type: cosine_accuracy_threshold
62
+ value: 0.8879227638244629
63
+ name: Cosine Accuracy Threshold
64
+ - type: cosine_f1
65
+ value: 0.6980609418282548
66
+ name: Cosine F1
67
+ - type: cosine_f1_threshold
68
+ value: -0.5465683937072754
69
+ name: Cosine F1 Threshold
70
+ - type: cosine_precision
71
+ value: 0.5436893203883495
72
+ name: Cosine Precision
73
+ - type: cosine_recall
74
+ value: 0.9748549323017408
75
+ name: Cosine Recall
76
+ - type: cosine_ap
77
+ value: 0.6971622829878537
78
+ name: Cosine Ap
79
+ - type: cosine_mcc
80
+ value: 0.19032555952847827
81
+ name: Cosine Mcc
82
+ ---
83
+
84
+ # SentenceTransformer based on kiarashmo/chembberta-77m-mlm-safetensors
85
+
86
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [kiarashmo/chembberta-77m-mlm-safetensors](https://huggingface.co/kiarashmo/chembberta-77m-mlm-safetensors). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
87
+
88
+ ## Model Details
89
+
90
+ ### Model Description
91
+ - **Model Type:** Sentence Transformer
92
+ - **Base model:** [kiarashmo/chembberta-77m-mlm-safetensors](https://huggingface.co/kiarashmo/chembberta-77m-mlm-safetensors) <!-- at revision 9d0b79d268438177519adce1e36395ea0ae363e9 -->
93
+ - **Maximum Sequence Length:** 512 tokens
94
+ - **Output Dimensionality:** 384 dimensions
95
+ - **Similarity Function:** Cosine Similarity
96
+ <!-- - **Training Dataset:** Unknown -->
97
+ <!-- - **Language:** Unknown -->
98
+ <!-- - **License:** Unknown -->
99
+
100
+ ### Model Sources
101
+
102
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
103
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
104
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
105
+
106
+ ### Full Model Architecture
107
+
108
+ ```
109
+ SentenceTransformer(
110
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
111
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
112
+ )
113
+ ```
114
+
115
+ ## Usage
116
+
117
+ ### Direct Usage (Sentence Transformers)
118
+
119
+ First install the Sentence Transformers library:
120
+
121
+ ```bash
122
+ pip install -U sentence-transformers
123
+ ```
124
+
125
+ Then you can load this model and run inference.
126
+ ```python
127
+ from sentence_transformers import SentenceTransformer
128
+
129
+ # Download from the 🤗 Hub
130
+ model = SentenceTransformer("sentence_transformers_model_id")
131
+ # Run inference
132
+ sentences = [
133
+ 'CC(C)CNCc1ccc(-c2ccccc2S(=O)(=O)N2CCCC2)cc1',
134
+ 'COC(=O)NC(C(=O)NC(Cc1ccccc1)C(O)CN(Cc1ccc(-c2ccccn2)cc1)NC(=O)C(NC(=O)OC)C(C)(C)C)C(C)(C)C',
135
+ 'COc1ccc(C(=O)CC(=O)c2ccc(C(C)(C)C)cc2)cc1',
136
+ ]
137
+ embeddings = model.encode(sentences)
138
+ print(embeddings.shape)
139
+ # [3, 384]
140
+
141
+ # Get the similarity scores for the embeddings
142
+ similarities = model.similarity(embeddings, embeddings)
143
+ print(similarities)
144
+ # tensor([[ 1.0000, 0.8293, -0.3326],
145
+ # [ 0.8293, 1.0000, -0.0993],
146
+ # [-0.3326, -0.0993, 1.0000]])
147
+ ```
148
+
149
+ <!--
150
+ ### Direct Usage (Transformers)
151
+
152
+ <details><summary>Click to see the direct usage in Transformers</summary>
153
+
154
+ </details>
155
+ -->
156
+
157
+ <!--
158
+ ### Downstream Usage (Sentence Transformers)
159
+
160
+ You can finetune this model on your own dataset.
161
+
162
+ <details><summary>Click to expand</summary>
163
+
164
+ </details>
165
+ -->
166
+
167
+ <!--
168
+ ### Out-of-Scope Use
169
+
170
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
171
+ -->
172
+
173
+ ## Evaluation
174
+
175
+ ### Metrics
176
+
177
+ #### Binary Classification
178
+
179
+ * Dataset: `val-sim`
180
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
181
+
182
+ | Metric | Value |
183
+ |:--------------------------|:-----------|
184
+ | cosine_accuracy | 0.611 |
185
+ | cosine_accuracy_threshold | 0.8879 |
186
+ | cosine_f1 | 0.6981 |
187
+ | cosine_f1_threshold | -0.5466 |
188
+ | cosine_precision | 0.5437 |
189
+ | cosine_recall | 0.9749 |
190
+ | **cosine_ap** | **0.6972** |
191
+ | cosine_mcc | 0.1903 |
192
+
193
+ <!--
194
+ ## Bias, Risks and Limitations
195
+
196
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
197
+ -->
198
+
199
+ <!--
200
+ ### Recommendations
201
+
202
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
203
+ -->
204
+
205
+ ## Training Details
206
+
207
+ ### Training Dataset
208
+
209
+ #### Unnamed Dataset
210
+
211
+ * Size: 2,184 training samples
212
+ * Columns: <code>text</code> and <code>label</code>
213
+ * Approximate statistics based on the first 1000 samples:
214
+ | | text | label |
215
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------|
216
+ | type | string | int |
217
+ | details | <ul><li>min: 3 tokens</li><li>mean: 43.69 tokens</li><li>max: 221 tokens</li></ul> | <ul><li>0: ~92.10%</li><li>1: ~7.90%</li></ul> |
218
+ * Samples:
219
+ | text | label |
220
+ |:------------------------------------------------------------------------|:---------------|
221
+ | <code>CC(C)CC(NC(=O)CNC(=O)c1cc(Cl)ccc1Cl)B(O)O</code> | <code>1</code> |
222
+ | <code>O=C(NCC(O)CO)c1c(I)c(C(=O)NCC(O)CO)c(I)c(N(CCO)C(=O)CO)c1I</code> | <code>0</code> |
223
+ | <code>Clc1cc(Cl)c(OCC#CI)cc1Cl</code> | <code>0</code> |
224
+ * Loss: [<code>BatchAllTripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchalltripletloss)
225
+
226
+ ### Evaluation Dataset
227
+
228
+ #### Unnamed Dataset
229
+
230
+ * Size: 282 evaluation samples
231
+ * Columns: <code>text</code> and <code>label</code>
232
+ * Approximate statistics based on the first 282 samples:
233
+ | | text | label |
234
+ |:--------|:------------------------------------------------------------------------------------|:------------------------------------------------|
235
+ | type | string | int |
236
+ | details | <ul><li>min: 18 tokens</li><li>mean: 65.88 tokens</li><li>max: 244 tokens</li></ul> | <ul><li>0: ~50.00%</li><li>1: ~50.00%</li></ul> |
237
+ * Samples:
238
+ | text | label |
239
+ |:-------------------------------------------------------------------------------|:---------------|
240
+ | <code>CC(C)CNCc1ccc(-c2ccccc2S(=O)(=O)N2CCCC2)cc1</code> | <code>1</code> |
241
+ | <code>CC(C)Cn1cnc2c(N)nc3ccccc3c21</code> | <code>0</code> |
242
+ | <code>CC(C)CN(CC(O)C(Cc1ccccc1)NC(=O)OC1COC2OCCC12)S(=O)(=O)c1ccc(N)cc1</code> | <code>0</code> |
243
+ * Loss: [<code>BatchAllTripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#batchalltripletloss)
244
+
245
+ ### Training Hyperparameters
246
+ #### Non-Default Hyperparameters
247
+
248
+ - `eval_strategy`: steps
249
+ - `per_device_train_batch_size`: 32
250
+ - `per_device_eval_batch_size`: 32
251
+ - `num_train_epochs`: 100
252
+ - `warmup_steps`: 100
253
+ - `load_best_model_at_end`: True
254
+
255
+ #### All Hyperparameters
256
+ <details><summary>Click to expand</summary>
257
+
258
+ - `overwrite_output_dir`: False
259
+ - `do_predict`: False
260
+ - `eval_strategy`: steps
261
+ - `prediction_loss_only`: True
262
+ - `per_device_train_batch_size`: 32
263
+ - `per_device_eval_batch_size`: 32
264
+ - `per_gpu_train_batch_size`: None
265
+ - `per_gpu_eval_batch_size`: None
266
+ - `gradient_accumulation_steps`: 1
267
+ - `eval_accumulation_steps`: None
268
+ - `torch_empty_cache_steps`: None
269
+ - `learning_rate`: 5e-05
270
+ - `weight_decay`: 0.0
271
+ - `adam_beta1`: 0.9
272
+ - `adam_beta2`: 0.999
273
+ - `adam_epsilon`: 1e-08
274
+ - `max_grad_norm`: 1.0
275
+ - `num_train_epochs`: 100
276
+ - `max_steps`: -1
277
+ - `lr_scheduler_type`: linear
278
+ - `lr_scheduler_kwargs`: {}
279
+ - `warmup_ratio`: 0.0
280
+ - `warmup_steps`: 100
281
+ - `log_level`: passive
282
+ - `log_level_replica`: warning
283
+ - `log_on_each_node`: True
284
+ - `logging_nan_inf_filter`: True
285
+ - `save_safetensors`: True
286
+ - `save_on_each_node`: False
287
+ - `save_only_model`: False
288
+ - `restore_callback_states_from_checkpoint`: False
289
+ - `no_cuda`: False
290
+ - `use_cpu`: False
291
+ - `use_mps_device`: False
292
+ - `seed`: 42
293
+ - `data_seed`: None
294
+ - `jit_mode_eval`: False
295
+ - `use_ipex`: False
296
+ - `bf16`: False
297
+ - `fp16`: False
298
+ - `fp16_opt_level`: O1
299
+ - `half_precision_backend`: auto
300
+ - `bf16_full_eval`: False
301
+ - `fp16_full_eval`: False
302
+ - `tf32`: None
303
+ - `local_rank`: 0
304
+ - `ddp_backend`: None
305
+ - `tpu_num_cores`: None
306
+ - `tpu_metrics_debug`: False
307
+ - `debug`: []
308
+ - `dataloader_drop_last`: False
309
+ - `dataloader_num_workers`: 0
310
+ - `dataloader_prefetch_factor`: None
311
+ - `past_index`: -1
312
+ - `disable_tqdm`: False
313
+ - `remove_unused_columns`: True
314
+ - `label_names`: None
315
+ - `load_best_model_at_end`: True
316
+ - `ignore_data_skip`: False
317
+ - `fsdp`: []
318
+ - `fsdp_min_num_params`: 0
319
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
320
+ - `fsdp_transformer_layer_cls_to_wrap`: None
321
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
322
+ - `deepspeed`: None
323
+ - `label_smoothing_factor`: 0.0
324
+ - `optim`: adamw_torch
325
+ - `optim_args`: None
326
+ - `adafactor`: False
327
+ - `group_by_length`: False
328
+ - `length_column_name`: length
329
+ - `ddp_find_unused_parameters`: None
330
+ - `ddp_bucket_cap_mb`: None
331
+ - `ddp_broadcast_buffers`: False
332
+ - `dataloader_pin_memory`: True
333
+ - `dataloader_persistent_workers`: False
334
+ - `skip_memory_metrics`: True
335
+ - `use_legacy_prediction_loop`: False
336
+ - `push_to_hub`: False
337
+ - `resume_from_checkpoint`: None
338
+ - `hub_model_id`: None
339
+ - `hub_strategy`: every_save
340
+ - `hub_private_repo`: None
341
+ - `hub_always_push`: False
342
+ - `hub_revision`: None
343
+ - `gradient_checkpointing`: False
344
+ - `gradient_checkpointing_kwargs`: None
345
+ - `include_inputs_for_metrics`: False
346
+ - `include_for_metrics`: []
347
+ - `eval_do_concat_batches`: True
348
+ - `fp16_backend`: auto
349
+ - `push_to_hub_model_id`: None
350
+ - `push_to_hub_organization`: None
351
+ - `mp_parameters`:
352
+ - `auto_find_batch_size`: False
353
+ - `full_determinism`: False
354
+ - `torchdynamo`: None
355
+ - `ray_scope`: last
356
+ - `ddp_timeout`: 1800
357
+ - `torch_compile`: False
358
+ - `torch_compile_backend`: None
359
+ - `torch_compile_mode`: None
360
+ - `include_tokens_per_second`: False
361
+ - `include_num_input_tokens_seen`: False
362
+ - `neftune_noise_alpha`: None
363
+ - `optim_target_modules`: None
364
+ - `batch_eval_metrics`: False
365
+ - `eval_on_start`: False
366
+ - `use_liger_kernel`: False
367
+ - `liger_kernel_config`: None
368
+ - `eval_use_gather_object`: False
369
+ - `average_tokens_across_devices`: False
370
+ - `prompts`: None
371
+ - `batch_sampler`: batch_sampler
372
+ - `multi_dataset_batch_sampler`: proportional
373
+ - `router_mapping`: {}
374
+ - `learning_rate_mapping`: {}
375
+
376
+ </details>
377
+
378
+ ### Training Logs
379
+ | Epoch | Step | Training Loss | Validation Loss | val-sim_cosine_ap |
380
+ |:------:|:----:|:-------------:|:---------------:|:-----------------:|
381
+ | 7.2464 | 500 | 4.0383 | 5.2239 | 0.6972 |
382
+
383
+
384
+ ### Framework Versions
385
+ - Python: 3.9.23
386
+ - Sentence Transformers: 5.0.0
387
+ - Transformers: 4.53.3
388
+ - PyTorch: 2.5.0+cu118
389
+ - Accelerate: 1.9.0
390
+ - Datasets: 4.0.0
391
+ - Tokenizers: 0.21.2
392
+
393
+ ## Citation
394
+
395
+ ### BibTeX
396
+
397
+ #### Sentence Transformers
398
+ ```bibtex
399
+ @inproceedings{reimers-2019-sentence-bert,
400
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
401
+ author = "Reimers, Nils and Gurevych, Iryna",
402
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
403
+ month = "11",
404
+ year = "2019",
405
+ publisher = "Association for Computational Linguistics",
406
+ url = "https://arxiv.org/abs/1908.10084",
407
+ }
408
+ ```
409
+
410
+ #### BatchAllTripletLoss
411
+ ```bibtex
412
+ @misc{hermans2017defense,
413
+ title={In Defense of the Triplet Loss for Person Re-Identification},
414
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
415
+ year={2017},
416
+ eprint={1703.07737},
417
+ archivePrefix={arXiv},
418
+ primaryClass={cs.CV}
419
+ }
420
+ ```
421
+
422
+ <!--
423
+ ## Glossary
424
+
425
+ *Clearly define terms in order to be accessible across audiences.*
426
+ -->
427
+
428
+ <!--
429
+ ## Model Card Authors
430
+
431
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
432
+ -->
433
+
434
+ <!--
435
+ ## Model Card Contact
436
+
437
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
438
+ -->
checkpoint-500/added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "</s>": 592,
3
+ "<s>": 591
4
+ }
checkpoint-500/config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.109,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "gradient_checkpointing": false,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.144,
12
+ "hidden_size": 384,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 464,
15
+ "is_gpu": true,
16
+ "layer_norm_eps": 1e-12,
17
+ "max_position_embeddings": 515,
18
+ "model_type": "roberta",
19
+ "num_attention_heads": 12,
20
+ "num_hidden_layers": 3,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.53.3",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 600
28
+ }
checkpoint-500/config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.0.0",
5
+ "transformers": "4.53.3",
6
+ "pytorch": "2.5.0+cu118"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
checkpoint-500/merges.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ #version: 0.2
checkpoint-500/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe61144fb1d4268afd35215ccc03497692fa64748cdd08ddb67b0b2c96d7b9c9
3
+ size 13715688
checkpoint-500/modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
checkpoint-500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:447143facddf4b6307053b45f27a701dcf8080e2725821b4723e34ec10199f56
3
+ size 26281530
checkpoint-500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9cde0c0ea720e80062f89613c0595e1f37196b7b6342a93bf2977d4f53b2ae7
3
+ size 14244
checkpoint-500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:096756c6c9ef24132ec200eca2134019e0d659cf988b0db8af0f42784c02a33e
3
+ size 1064
checkpoint-500/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
checkpoint-500/special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
checkpoint-500/tokenizer.json ADDED
@@ -0,0 +1,712 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 512,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": "BatchLongest",
11
+ "direction": "Right",
12
+ "pad_to_multiple_of": null,
13
+ "pad_id": 0,
14
+ "pad_type_id": 0,
15
+ "pad_token": "[PAD]"
16
+ },
17
+ "added_tokens": [
18
+ {
19
+ "id": 0,
20
+ "content": "[PAD]",
21
+ "single_word": false,
22
+ "lstrip": false,
23
+ "rstrip": false,
24
+ "normalized": false,
25
+ "special": true
26
+ },
27
+ {
28
+ "id": 11,
29
+ "content": "[UNK]",
30
+ "single_word": false,
31
+ "lstrip": false,
32
+ "rstrip": false,
33
+ "normalized": false,
34
+ "special": true
35
+ },
36
+ {
37
+ "id": 12,
38
+ "content": "[CLS]",
39
+ "single_word": false,
40
+ "lstrip": false,
41
+ "rstrip": false,
42
+ "normalized": false,
43
+ "special": true
44
+ },
45
+ {
46
+ "id": 13,
47
+ "content": "[SEP]",
48
+ "single_word": false,
49
+ "lstrip": false,
50
+ "rstrip": false,
51
+ "normalized": false,
52
+ "special": true
53
+ },
54
+ {
55
+ "id": 14,
56
+ "content": "[MASK]",
57
+ "single_word": false,
58
+ "lstrip": true,
59
+ "rstrip": false,
60
+ "normalized": false,
61
+ "special": true
62
+ },
63
+ {
64
+ "id": 591,
65
+ "content": "<s>",
66
+ "single_word": false,
67
+ "lstrip": false,
68
+ "rstrip": false,
69
+ "normalized": true,
70
+ "special": true
71
+ },
72
+ {
73
+ "id": 592,
74
+ "content": "</s>",
75
+ "single_word": false,
76
+ "lstrip": false,
77
+ "rstrip": false,
78
+ "normalized": true,
79
+ "special": true
80
+ }
81
+ ],
82
+ "normalizer": null,
83
+ "pre_tokenizer": {
84
+ "type": "ByteLevel",
85
+ "add_prefix_space": false,
86
+ "trim_offsets": true,
87
+ "use_regex": true
88
+ },
89
+ "post_processor": {
90
+ "type": "RobertaProcessing",
91
+ "sep": [
92
+ "[SEP]",
93
+ 13
94
+ ],
95
+ "cls": [
96
+ "[CLS]",
97
+ 12
98
+ ],
99
+ "trim_offsets": true,
100
+ "add_prefix_space": false
101
+ },
102
+ "decoder": {
103
+ "type": "ByteLevel",
104
+ "add_prefix_space": true,
105
+ "trim_offsets": true,
106
+ "use_regex": true
107
+ },
108
+ "model": {
109
+ "type": "BPE",
110
+ "dropout": null,
111
+ "unk_token": null,
112
+ "continuing_subword_prefix": "",
113
+ "end_of_word_suffix": "",
114
+ "fuse_unk": false,
115
+ "byte_fallback": false,
116
+ "ignore_merges": false,
117
+ "vocab": {
118
+ "[PAD]": 0,
119
+ "[unused1]": 1,
120
+ "[unused2]": 2,
121
+ "[unused3]": 3,
122
+ "[unused4]": 4,
123
+ "[unused5]": 5,
124
+ "[unused6]": 6,
125
+ "[unused7]": 7,
126
+ "[unused8]": 8,
127
+ "[unused9]": 9,
128
+ "[unused10]": 10,
129
+ "[UNK]": 11,
130
+ "[CLS]": 12,
131
+ "[SEP]": 13,
132
+ "[MASK]": 14,
133
+ "c": 15,
134
+ "C": 16,
135
+ "(": 17,
136
+ ")": 18,
137
+ "O": 19,
138
+ "1": 20,
139
+ "2": 21,
140
+ "=": 22,
141
+ "N": 23,
142
+ ".": 24,
143
+ "n": 25,
144
+ "3": 26,
145
+ "F": 27,
146
+ "Cl": 28,
147
+ ">>": 29,
148
+ "~": 30,
149
+ "-": 31,
150
+ "4": 32,
151
+ "[C@H]": 33,
152
+ "S": 34,
153
+ "[C@@H]": 35,
154
+ "[O-]": 36,
155
+ "Br": 37,
156
+ "#": 38,
157
+ "/": 39,
158
+ "[nH]": 40,
159
+ "[N+]": 41,
160
+ "s": 42,
161
+ "5": 43,
162
+ "o": 44,
163
+ "P": 45,
164
+ "[Na+]": 46,
165
+ "[Si]": 47,
166
+ "I": 48,
167
+ "[Na]": 49,
168
+ "[Pd]": 50,
169
+ "[K+]": 51,
170
+ "[K]": 52,
171
+ "[P]": 53,
172
+ "B": 54,
173
+ "[C@]": 55,
174
+ "[C@@]": 56,
175
+ "[Cl-]": 57,
176
+ "6": 58,
177
+ "[OH-]": 59,
178
+ "\\": 60,
179
+ "[N-]": 61,
180
+ "[Li]": 62,
181
+ "[H]": 63,
182
+ "[2H]": 64,
183
+ "[NH4+]": 65,
184
+ "[c-]": 66,
185
+ "[P-]": 67,
186
+ "[Cs+]": 68,
187
+ "[Li+]": 69,
188
+ "[Cs]": 70,
189
+ "[NaH]": 71,
190
+ "[H-]": 72,
191
+ "[O+]": 73,
192
+ "[BH4-]": 74,
193
+ "[Cu]": 75,
194
+ "7": 76,
195
+ "[Mg]": 77,
196
+ "[Fe+2]": 78,
197
+ "[n+]": 79,
198
+ "[Sn]": 80,
199
+ "[BH-]": 81,
200
+ "[Pd+2]": 82,
201
+ "[CH]": 83,
202
+ "[I-]": 84,
203
+ "[Br-]": 85,
204
+ "[C-]": 86,
205
+ "[Zn]": 87,
206
+ "[B-]": 88,
207
+ "[F-]": 89,
208
+ "[Al]": 90,
209
+ "[P+]": 91,
210
+ "[BH3-]": 92,
211
+ "[Fe]": 93,
212
+ "[C]": 94,
213
+ "[AlH4]": 95,
214
+ "[Ni]": 96,
215
+ "[SiH]": 97,
216
+ "8": 98,
217
+ "[Cu+2]": 99,
218
+ "[Mn]": 100,
219
+ "[AlH]": 101,
220
+ "[nH+]": 102,
221
+ "[AlH4-]": 103,
222
+ "[O-2]": 104,
223
+ "[Cr]": 105,
224
+ "[Mg+2]": 106,
225
+ "[NH3+]": 107,
226
+ "[S@]": 108,
227
+ "[Pt]": 109,
228
+ "[Al+3]": 110,
229
+ "[S@@]": 111,
230
+ "[S-]": 112,
231
+ "[Ti]": 113,
232
+ "[Zn+2]": 114,
233
+ "[PH]": 115,
234
+ "[NH2+]": 116,
235
+ "[Ru]": 117,
236
+ "[Ag+]": 118,
237
+ "[S+]": 119,
238
+ "[I+3]": 120,
239
+ "[NH+]": 121,
240
+ "[Ca+2]": 122,
241
+ "[Ag]": 123,
242
+ "9": 124,
243
+ "[Os]": 125,
244
+ "[Se]": 126,
245
+ "[SiH2]": 127,
246
+ "[Ca]": 128,
247
+ "[Ti+4]": 129,
248
+ "[Ac]": 130,
249
+ "[Cu+]": 131,
250
+ "[S]": 132,
251
+ "[Rh]": 133,
252
+ "[Cl+3]": 134,
253
+ "[cH-]": 135,
254
+ "[Zn+]": 136,
255
+ "[O]": 137,
256
+ "[Cl+]": 138,
257
+ "[SH]": 139,
258
+ "[H+]": 140,
259
+ "[Pd+]": 141,
260
+ "[se]": 142,
261
+ "[PH+]": 143,
262
+ "[I]": 144,
263
+ "[Pt+2]": 145,
264
+ "[C+]": 146,
265
+ "[Mg+]": 147,
266
+ "[Hg]": 148,
267
+ "[W]": 149,
268
+ "[SnH]": 150,
269
+ "[SiH3]": 151,
270
+ "[Fe+3]": 152,
271
+ "[NH]": 153,
272
+ "[Mo]": 154,
273
+ "[CH2+]": 155,
274
+ "%10": 156,
275
+ "[CH2-]": 157,
276
+ "[CH2]": 158,
277
+ "[n-]": 159,
278
+ "[Ce+4]": 160,
279
+ "[NH-]": 161,
280
+ "[Co]": 162,
281
+ "[I+]": 163,
282
+ "[PH2]": 164,
283
+ "[Pt+4]": 165,
284
+ "[Ce]": 166,
285
+ "[B]": 167,
286
+ "[Sn+2]": 168,
287
+ "[Ba+2]": 169,
288
+ "%11": 170,
289
+ "[Fe-3]": 171,
290
+ "[18F]": 172,
291
+ "[SH-]": 173,
292
+ "[Pb+2]": 174,
293
+ "[Os-2]": 175,
294
+ "[Zr+4]": 176,
295
+ "[N]": 177,
296
+ "[Ir]": 178,
297
+ "[Bi]": 179,
298
+ "[Ni+2]": 180,
299
+ "[P@]": 181,
300
+ "[Co+2]": 182,
301
+ "[s+]": 183,
302
+ "[As]": 184,
303
+ "[P+3]": 185,
304
+ "[Hg+2]": 186,
305
+ "[Yb+3]": 187,
306
+ "[CH-]": 188,
307
+ "[Zr+2]": 189,
308
+ "[Mn+2]": 190,
309
+ "[CH+]": 191,
310
+ "[In]": 192,
311
+ "[KH]": 193,
312
+ "[Ce+3]": 194,
313
+ "[Zr]": 195,
314
+ "[AlH2-]": 196,
315
+ "[OH2+]": 197,
316
+ "[Ti+3]": 198,
317
+ "[Rh+2]": 199,
318
+ "[Sb]": 200,
319
+ "[S-2]": 201,
320
+ "%12": 202,
321
+ "[P@@]": 203,
322
+ "[Si@H]": 204,
323
+ "[Mn+4]": 205,
324
+ "p": 206,
325
+ "[Ba]": 207,
326
+ "[NH2-]": 208,
327
+ "[Ge]": 209,
328
+ "[Pb+4]": 210,
329
+ "[Cr+3]": 211,
330
+ "[Au]": 212,
331
+ "[LiH]": 213,
332
+ "[Sc+3]": 214,
333
+ "[o+]": 215,
334
+ "[Rh-3]": 216,
335
+ "%13": 217,
336
+ "[Br]": 218,
337
+ "[Sb-]": 219,
338
+ "[S@+]": 220,
339
+ "[I+2]": 221,
340
+ "[Ar]": 222,
341
+ "[V]": 223,
342
+ "[Cu-]": 224,
343
+ "[Al-]": 225,
344
+ "[Te]": 226,
345
+ "[13c]": 227,
346
+ "[13C]": 228,
347
+ "[Cl]": 229,
348
+ "[PH4+]": 230,
349
+ "[SiH4]": 231,
350
+ "[te]": 232,
351
+ "[CH3-]": 233,
352
+ "[S@@+]": 234,
353
+ "[Rh+3]": 235,
354
+ "[SH+]": 236,
355
+ "[Bi+3]": 237,
356
+ "[Br+2]": 238,
357
+ "[La]": 239,
358
+ "[La+3]": 240,
359
+ "[Pt-2]": 241,
360
+ "[N@@]": 242,
361
+ "[PH3+]": 243,
362
+ "[N@]": 244,
363
+ "[Si+4]": 245,
364
+ "[Sr+2]": 246,
365
+ "[Al+]": 247,
366
+ "[Pb]": 248,
367
+ "[SeH]": 249,
368
+ "[Si-]": 250,
369
+ "[V+5]": 251,
370
+ "[Y+3]": 252,
371
+ "[Re]": 253,
372
+ "[Ru+]": 254,
373
+ "[Sm]": 255,
374
+ "*": 256,
375
+ "[3H]": 257,
376
+ "[NH2]": 258,
377
+ "[Ag-]": 259,
378
+ "[13CH3]": 260,
379
+ "[OH+]": 261,
380
+ "[Ru+3]": 262,
381
+ "[OH]": 263,
382
+ "[Gd+3]": 264,
383
+ "[13CH2]": 265,
384
+ "[In+3]": 266,
385
+ "[Si@@]": 267,
386
+ "[Si@]": 268,
387
+ "[Ti+2]": 269,
388
+ "[Sn+]": 270,
389
+ "[Cl+2]": 271,
390
+ "[AlH-]": 272,
391
+ "[Pd-2]": 273,
392
+ "[SnH3]": 274,
393
+ "[B+3]": 275,
394
+ "[Cu-2]": 276,
395
+ "[Nd+3]": 277,
396
+ "[Pb+3]": 278,
397
+ "[13cH]": 279,
398
+ "[Fe-4]": 280,
399
+ "[Ga]": 281,
400
+ "[Sn+4]": 282,
401
+ "[Hg+]": 283,
402
+ "[11CH3]": 284,
403
+ "[Hf]": 285,
404
+ "[Pr]": 286,
405
+ "[Y]": 287,
406
+ "[S+2]": 288,
407
+ "[Cd]": 289,
408
+ "[Cr+6]": 290,
409
+ "[Zr+3]": 291,
410
+ "[Rh+]": 292,
411
+ "[CH3]": 293,
412
+ "[N-3]": 294,
413
+ "[Hf+2]": 295,
414
+ "[Th]": 296,
415
+ "[Sb+3]": 297,
416
+ "%14": 298,
417
+ "[Cr+2]": 299,
418
+ "[Ru+2]": 300,
419
+ "[Hf+4]": 301,
420
+ "[14C]": 302,
421
+ "[Ta]": 303,
422
+ "[Tl+]": 304,
423
+ "[B+]": 305,
424
+ "[Os+4]": 306,
425
+ "[PdH2]": 307,
426
+ "[Pd-]": 308,
427
+ "[Cd+2]": 309,
428
+ "[Co+3]": 310,
429
+ "[S+4]": 311,
430
+ "[Nb+5]": 312,
431
+ "[123I]": 313,
432
+ "[c+]": 314,
433
+ "[Rb+]": 315,
434
+ "[V+2]": 316,
435
+ "[CH3+]": 317,
436
+ "[Ag+2]": 318,
437
+ "[cH+]": 319,
438
+ "[Mn+3]": 320,
439
+ "[Se-]": 321,
440
+ "[As-]": 322,
441
+ "[Eu+3]": 323,
442
+ "[SH2]": 324,
443
+ "[Sm+3]": 325,
444
+ "[IH+]": 326,
445
+ "%15": 327,
446
+ "[OH3+]": 328,
447
+ "[PH3]": 329,
448
+ "[IH2+]": 330,
449
+ "[SH2+]": 331,
450
+ "[Ir+3]": 332,
451
+ "[AlH3]": 333,
452
+ "[Sc]": 334,
453
+ "[Yb]": 335,
454
+ "[15NH2]": 336,
455
+ "[Lu]": 337,
456
+ "[sH+]": 338,
457
+ "[Gd]": 339,
458
+ "[18F-]": 340,
459
+ "[SH3+]": 341,
460
+ "[SnH4]": 342,
461
+ "[TeH]": 343,
462
+ "[Si@@H]": 344,
463
+ "[Ga+3]": 345,
464
+ "[CaH2]": 346,
465
+ "[Tl]": 347,
466
+ "[Ta+5]": 348,
467
+ "[GeH]": 349,
468
+ "[Br+]": 350,
469
+ "[Sr]": 351,
470
+ "[Tl+3]": 352,
471
+ "[Sm+2]": 353,
472
+ "[PH5]": 354,
473
+ "%16": 355,
474
+ "[N@@+]": 356,
475
+ "[Au+3]": 357,
476
+ "[C-4]": 358,
477
+ "[Nd]": 359,
478
+ "[Ti+]": 360,
479
+ "[IH]": 361,
480
+ "[N@+]": 362,
481
+ "[125I]": 363,
482
+ "[Eu]": 364,
483
+ "[Sn+3]": 365,
484
+ "[Nb]": 366,
485
+ "[Er+3]": 367,
486
+ "[123I-]": 368,
487
+ "[14c]": 369,
488
+ "%17": 370,
489
+ "[SnH2]": 371,
490
+ "[YH]": 372,
491
+ "[Sb+5]": 373,
492
+ "[Pr+3]": 374,
493
+ "[Ir+]": 375,
494
+ "[N+3]": 376,
495
+ "[AlH2]": 377,
496
+ "[19F]": 378,
497
+ "%18": 379,
498
+ "[Tb]": 380,
499
+ "[14CH]": 381,
500
+ "[Mo+4]": 382,
501
+ "[Si+]": 383,
502
+ "[BH]": 384,
503
+ "[Be]": 385,
504
+ "[Rb]": 386,
505
+ "[pH]": 387,
506
+ "%19": 388,
507
+ "%20": 389,
508
+ "[Xe]": 390,
509
+ "[Ir-]": 391,
510
+ "[Be+2]": 392,
511
+ "[C+4]": 393,
512
+ "[RuH2]": 394,
513
+ "[15NH]": 395,
514
+ "[U+2]": 396,
515
+ "[Au-]": 397,
516
+ "%21": 398,
517
+ "%22": 399,
518
+ "[Au+]": 400,
519
+ "[15n]": 401,
520
+ "[Al+2]": 402,
521
+ "[Tb+3]": 403,
522
+ "[15N]": 404,
523
+ "[V+3]": 405,
524
+ "[W+6]": 406,
525
+ "[14CH3]": 407,
526
+ "[Cr+4]": 408,
527
+ "[ClH+]": 409,
528
+ "b": 410,
529
+ "[Ti+6]": 411,
530
+ "[Nd+]": 412,
531
+ "[Zr+]": 413,
532
+ "[PH2+]": 414,
533
+ "[Fm]": 415,
534
+ "[N@H+]": 416,
535
+ "[RuH]": 417,
536
+ "[Dy+3]": 418,
537
+ "%23": 419,
538
+ "[Hf+3]": 420,
539
+ "[W+4]": 421,
540
+ "[11C]": 422,
541
+ "[13CH]": 423,
542
+ "[Er]": 424,
543
+ "[124I]": 425,
544
+ "[LaH]": 426,
545
+ "[F]": 427,
546
+ "[siH]": 428,
547
+ "[Ga+]": 429,
548
+ "[Cm]": 430,
549
+ "[GeH3]": 431,
550
+ "[IH-]": 432,
551
+ "[U+6]": 433,
552
+ "[SeH+]": 434,
553
+ "[32P]": 435,
554
+ "[SeH-]": 436,
555
+ "[Pt-]": 437,
556
+ "[Ir+2]": 438,
557
+ "[se+]": 439,
558
+ "[U]": 440,
559
+ "[F+]": 441,
560
+ "[BH2]": 442,
561
+ "[As+]": 443,
562
+ "[Cf]": 444,
563
+ "[ClH2+]": 445,
564
+ "[Ni+]": 446,
565
+ "[TeH3]": 447,
566
+ "[SbH2]": 448,
567
+ "[Ag+3]": 449,
568
+ "%24": 450,
569
+ "[18O]": 451,
570
+ "[PH4]": 452,
571
+ "[Os+2]": 453,
572
+ "[Na-]": 454,
573
+ "[Sb+2]": 455,
574
+ "[V+4]": 456,
575
+ "[Ho+3]": 457,
576
+ "[68Ga]": 458,
577
+ "[PH-]": 459,
578
+ "[Bi+2]": 460,
579
+ "[Ce+2]": 461,
580
+ "[Pd+3]": 462,
581
+ "[99Tc]": 463,
582
+ "[13C@@H]": 464,
583
+ "[Fe+6]": 465,
584
+ "[c]": 466,
585
+ "[GeH2]": 467,
586
+ "[10B]": 468,
587
+ "[Cu+3]": 469,
588
+ "[Mo+2]": 470,
589
+ "[Cr+]": 471,
590
+ "[Pd+4]": 472,
591
+ "[Dy]": 473,
592
+ "[AsH]": 474,
593
+ "[Ba+]": 475,
594
+ "[SeH2]": 476,
595
+ "[In+]": 477,
596
+ "[TeH2]": 478,
597
+ "[BrH+]": 479,
598
+ "[14cH]": 480,
599
+ "[W+]": 481,
600
+ "[13C@H]": 482,
601
+ "[AsH2]": 483,
602
+ "[In+2]": 484,
603
+ "[N+2]": 485,
604
+ "[N@@H+]": 486,
605
+ "[SbH]": 487,
606
+ "[60Co]": 488,
607
+ "[AsH4+]": 489,
608
+ "[AsH3]": 490,
609
+ "[18OH]": 491,
610
+ "[Ru-2]": 492,
611
+ "[Na-2]": 493,
612
+ "[CuH2]": 494,
613
+ "[31P]": 495,
614
+ "[Ti+5]": 496,
615
+ "[35S]": 497,
616
+ "[P@@H]": 498,
617
+ "[ArH]": 499,
618
+ "[Co+]": 500,
619
+ "[Zr-2]": 501,
620
+ "[BH2-]": 502,
621
+ "[131I]": 503,
622
+ "[SH5]": 504,
623
+ "[VH]": 505,
624
+ "[B+2]": 506,
625
+ "[Yb+2]": 507,
626
+ "[14C@H]": 508,
627
+ "[211At]": 509,
628
+ "[NH3+2]": 510,
629
+ "[IrH]": 511,
630
+ "[IrH2]": 512,
631
+ "[Rh-]": 513,
632
+ "[Cr-]": 514,
633
+ "[Sb+]": 515,
634
+ "[Ni+3]": 516,
635
+ "[TaH3]": 517,
636
+ "[Tl+2]": 518,
637
+ "[64Cu]": 519,
638
+ "[Tc]": 520,
639
+ "[Cd+]": 521,
640
+ "[1H]": 522,
641
+ "[15nH]": 523,
642
+ "[AlH2+]": 524,
643
+ "[FH+2]": 525,
644
+ "[BiH3]": 526,
645
+ "[Ru-]": 527,
646
+ "[Mo+6]": 528,
647
+ "[AsH+]": 529,
648
+ "[BaH2]": 530,
649
+ "[BaH]": 531,
650
+ "[Fe+4]": 532,
651
+ "[229Th]": 533,
652
+ "[Th+4]": 534,
653
+ "[As+3]": 535,
654
+ "[NH+3]": 536,
655
+ "[P@H]": 537,
656
+ "[Li-]": 538,
657
+ "[7NaH]": 539,
658
+ "[Bi+]": 540,
659
+ "[PtH+2]": 541,
660
+ "[p-]": 542,
661
+ "[Re+5]": 543,
662
+ "[NiH]": 544,
663
+ "[Ni-]": 545,
664
+ "[Xe+]": 546,
665
+ "[Ca+]": 547,
666
+ "[11c]": 548,
667
+ "[Rh+4]": 549,
668
+ "[AcH]": 550,
669
+ "[HeH]": 551,
670
+ "[Sc+2]": 552,
671
+ "[Mn+]": 553,
672
+ "[UH]": 554,
673
+ "[14CH2]": 555,
674
+ "[SiH4+]": 556,
675
+ "[18OH2]": 557,
676
+ "[Ac-]": 558,
677
+ "[Re+4]": 559,
678
+ "[118Sn]": 560,
679
+ "[153Sm]": 561,
680
+ "[P+2]": 562,
681
+ "[9CH]": 563,
682
+ "[9CH3]": 564,
683
+ "[Y-]": 565,
684
+ "[NiH2]": 566,
685
+ "[Si+2]": 567,
686
+ "[Mn+6]": 568,
687
+ "[ZrH2]": 569,
688
+ "[C-2]": 570,
689
+ "[Bi+5]": 571,
690
+ "[24NaH]": 572,
691
+ "[Fr]": 573,
692
+ "[15CH]": 574,
693
+ "[Se+]": 575,
694
+ "[At]": 576,
695
+ "[P-3]": 577,
696
+ "[124I-]": 578,
697
+ "[CuH2-]": 579,
698
+ "[Nb+4]": 580,
699
+ "[Nb+3]": 581,
700
+ "[MgH]": 582,
701
+ "[Ir+4]": 583,
702
+ "[67Ga+3]": 584,
703
+ "[67Ga]": 585,
704
+ "[13N]": 586,
705
+ "[15OH2]": 587,
706
+ "[2NH]": 588,
707
+ "[Ho]": 589,
708
+ "[Cn]": 590
709
+ },
710
+ "merges": []
711
+ }
712
+ }
checkpoint-500/tokenizer_config.json ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "11": {
13
+ "content": "[UNK]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "12": {
21
+ "content": "[CLS]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "13": {
29
+ "content": "[SEP]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "14": {
37
+ "content": "[MASK]",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "591": {
45
+ "content": "<s>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "592": {
53
+ "content": "</s>",
54
+ "lstrip": false,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ }
60
+ },
61
+ "bos_token": "<s>",
62
+ "clean_up_tokenization_spaces": false,
63
+ "cls_token": "[CLS]",
64
+ "eos_token": "</s>",
65
+ "errors": "replace",
66
+ "extra_special_tokens": {},
67
+ "full_tokenizer_file": null,
68
+ "mask_token": "[MASK]",
69
+ "max_len": 512,
70
+ "model_max_length": 512,
71
+ "pad_token": "[PAD]",
72
+ "sep_token": "[SEP]",
73
+ "tokenizer_class": "RobertaTokenizer",
74
+ "trim_offsets": true,
75
+ "unk_token": "[UNK]"
76
+ }
checkpoint-500/trainer_state.json ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 500,
3
+ "best_metric": 5.223876953125,
4
+ "best_model_checkpoint": "c:\\Users\\mokht\\Desktop\\molexp\\notebooks\\sBERT_finetuned_on_clintox_with_batch_all_triplet\\checkpoint-500",
5
+ "epoch": 7.246376811594203,
6
+ "eval_steps": 500,
7
+ "global_step": 500,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 7.246376811594203,
14
+ "grad_norm": 64.02399444580078,
15
+ "learning_rate": 4.706617647058824e-05,
16
+ "loss": 4.0383,
17
+ "step": 500
18
+ },
19
+ {
20
+ "epoch": 7.246376811594203,
21
+ "eval_loss": 5.223876953125,
22
+ "eval_runtime": 0.3816,
23
+ "eval_samples_per_second": 738.998,
24
+ "eval_steps_per_second": 23.585,
25
+ "eval_val-sim_cosine_accuracy": 0.611,
26
+ "eval_val-sim_cosine_accuracy_threshold": 0.8879227638244629,
27
+ "eval_val-sim_cosine_ap": 0.6971622829878537,
28
+ "eval_val-sim_cosine_f1": 0.6980609418282548,
29
+ "eval_val-sim_cosine_f1_threshold": -0.5465683937072754,
30
+ "eval_val-sim_cosine_mcc": 0.19032555952847827,
31
+ "eval_val-sim_cosine_precision": 0.5436893203883495,
32
+ "eval_val-sim_cosine_recall": 0.9748549323017408,
33
+ "step": 500
34
+ }
35
+ ],
36
+ "logging_steps": 500,
37
+ "max_steps": 6900,
38
+ "num_input_tokens_seen": 0,
39
+ "num_train_epochs": 100,
40
+ "save_steps": 500,
41
+ "stateful_callbacks": {
42
+ "EarlyStoppingCallback": {
43
+ "args": {
44
+ "early_stopping_patience": 5,
45
+ "early_stopping_threshold": 0.0
46
+ },
47
+ "attributes": {
48
+ "early_stopping_patience_counter": 0
49
+ }
50
+ },
51
+ "TrainerControl": {
52
+ "args": {
53
+ "should_epoch_stop": false,
54
+ "should_evaluate": false,
55
+ "should_log": false,
56
+ "should_save": true,
57
+ "should_training_stop": false
58
+ },
59
+ "attributes": {}
60
+ }
61
+ },
62
+ "total_flos": 0.0,
63
+ "train_batch_size": 32,
64
+ "trial_name": null,
65
+ "trial_params": null
66
+ }
checkpoint-500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9436833a74a03f2a62eb3e3103f11da698f6f4f094b8c2ba8ef4a9f27a4809a5
3
+ size 5752
checkpoint-500/vocab.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"[PAD]":0,"[unused1]":1,"[unused2]":2,"[unused3]":3,"[unused4]":4,"[unused5]":5,"[unused6]":6,"[unused7]":7,"[unused8]":8,"[unused9]":9,"[unused10]":10,"[UNK]":11,"[CLS]":12,"[SEP]":13,"[MASK]":14,"c":15,"C":16,"(":17,")":18,"O":19,"1":20,"2":21,"=":22,"N":23,".":24,"n":25,"3":26,"F":27,"Cl":28,">>":29,"~":30,"-":31,"4":32,"[C@H]":33,"S":34,"[C@@H]":35,"[O-]":36,"Br":37,"#":38,"/":39,"[nH]":40,"[N+]":41,"s":42,"5":43,"o":44,"P":45,"[Na+]":46,"[Si]":47,"I":48,"[Na]":49,"[Pd]":50,"[K+]":51,"[K]":52,"[P]":53,"B":54,"[C@]":55,"[C@@]":56,"[Cl-]":57,"6":58,"[OH-]":59,"\\":60,"[N-]":61,"[Li]":62,"[H]":63,"[2H]":64,"[NH4+]":65,"[c-]":66,"[P-]":67,"[Cs+]":68,"[Li+]":69,"[Cs]":70,"[NaH]":71,"[H-]":72,"[O+]":73,"[BH4-]":74,"[Cu]":75,"7":76,"[Mg]":77,"[Fe+2]":78,"[n+]":79,"[Sn]":80,"[BH-]":81,"[Pd+2]":82,"[CH]":83,"[I-]":84,"[Br-]":85,"[C-]":86,"[Zn]":87,"[B-]":88,"[F-]":89,"[Al]":90,"[P+]":91,"[BH3-]":92,"[Fe]":93,"[C]":94,"[AlH4]":95,"[Ni]":96,"[SiH]":97,"8":98,"[Cu+2]":99,"[Mn]":100,"[AlH]":101,"[nH+]":102,"[AlH4-]":103,"[O-2]":104,"[Cr]":105,"[Mg+2]":106,"[NH3+]":107,"[S@]":108,"[Pt]":109,"[Al+3]":110,"[S@@]":111,"[S-]":112,"[Ti]":113,"[Zn+2]":114,"[PH]":115,"[NH2+]":116,"[Ru]":117,"[Ag+]":118,"[S+]":119,"[I+3]":120,"[NH+]":121,"[Ca+2]":122,"[Ag]":123,"9":124,"[Os]":125,"[Se]":126,"[SiH2]":127,"[Ca]":128,"[Ti+4]":129,"[Ac]":130,"[Cu+]":131,"[S]":132,"[Rh]":133,"[Cl+3]":134,"[cH-]":135,"[Zn+]":136,"[O]":137,"[Cl+]":138,"[SH]":139,"[H+]":140,"[Pd+]":141,"[se]":142,"[PH+]":143,"[I]":144,"[Pt+2]":145,"[C+]":146,"[Mg+]":147,"[Hg]":148,"[W]":149,"[SnH]":150,"[SiH3]":151,"[Fe+3]":152,"[NH]":153,"[Mo]":154,"[CH2+]":155,"%10":156,"[CH2-]":157,"[CH2]":158,"[n-]":159,"[Ce+4]":160,"[NH-]":161,"[Co]":162,"[I+]":163,"[PH2]":164,"[Pt+4]":165,"[Ce]":166,"[B]":167,"[Sn+2]":168,"[Ba+2]":169,"%11":170,"[Fe-3]":171,"[18F]":172,"[SH-]":173,"[Pb+2]":174,"[Os-2]":175,"[Zr+4]":176,"[N]":177,"[Ir]":178,"[Bi]":179,"[Ni+2]":180,"[P@]":181,"[Co+2]":182,"[s+]":183,"[As]":184,"[P+3]":185,"[Hg+2]":186,"[Yb+3]":187,"[CH-]":188,"[Zr+2]":189,"[Mn+2]":190,"[CH+]":191,"[In]":192,"[KH]":193,"[Ce+3]":194,"[Zr]":195,"[AlH2-]":196,"[OH2+]":197,"[Ti+3]":198,"[Rh+2]":199,"[Sb]":200,"[S-2]":201,"%12":202,"[P@@]":203,"[Si@H]":204,"[Mn+4]":205,"p":206,"[Ba]":207,"[NH2-]":208,"[Ge]":209,"[Pb+4]":210,"[Cr+3]":211,"[Au]":212,"[LiH]":213,"[Sc+3]":214,"[o+]":215,"[Rh-3]":216,"%13":217,"[Br]":218,"[Sb-]":219,"[S@+]":220,"[I+2]":221,"[Ar]":222,"[V]":223,"[Cu-]":224,"[Al-]":225,"[Te]":226,"[13c]":227,"[13C]":228,"[Cl]":229,"[PH4+]":230,"[SiH4]":231,"[te]":232,"[CH3-]":233,"[S@@+]":234,"[Rh+3]":235,"[SH+]":236,"[Bi+3]":237,"[Br+2]":238,"[La]":239,"[La+3]":240,"[Pt-2]":241,"[N@@]":242,"[PH3+]":243,"[N@]":244,"[Si+4]":245,"[Sr+2]":246,"[Al+]":247,"[Pb]":248,"[SeH]":249,"[Si-]":250,"[V+5]":251,"[Y+3]":252,"[Re]":253,"[Ru+]":254,"[Sm]":255,"*":256,"[3H]":257,"[NH2]":258,"[Ag-]":259,"[13CH3]":260,"[OH+]":261,"[Ru+3]":262,"[OH]":263,"[Gd+3]":264,"[13CH2]":265,"[In+3]":266,"[Si@@]":267,"[Si@]":268,"[Ti+2]":269,"[Sn+]":270,"[Cl+2]":271,"[AlH-]":272,"[Pd-2]":273,"[SnH3]":274,"[B+3]":275,"[Cu-2]":276,"[Nd+3]":277,"[Pb+3]":278,"[13cH]":279,"[Fe-4]":280,"[Ga]":281,"[Sn+4]":282,"[Hg+]":283,"[11CH3]":284,"[Hf]":285,"[Pr]":286,"[Y]":287,"[S+2]":288,"[Cd]":289,"[Cr+6]":290,"[Zr+3]":291,"[Rh+]":292,"[CH3]":293,"[N-3]":294,"[Hf+2]":295,"[Th]":296,"[Sb+3]":297,"%14":298,"[Cr+2]":299,"[Ru+2]":300,"[Hf+4]":301,"[14C]":302,"[Ta]":303,"[Tl+]":304,"[B+]":305,"[Os+4]":306,"[PdH2]":307,"[Pd-]":308,"[Cd+2]":309,"[Co+3]":310,"[S+4]":311,"[Nb+5]":312,"[123I]":313,"[c+]":314,"[Rb+]":315,"[V+2]":316,"[CH3+]":317,"[Ag+2]":318,"[cH+]":319,"[Mn+3]":320,"[Se-]":321,"[As-]":322,"[Eu+3]":323,"[SH2]":324,"[Sm+3]":325,"[IH+]":326,"%15":327,"[OH3+]":328,"[PH3]":329,"[IH2+]":330,"[SH2+]":331,"[Ir+3]":332,"[AlH3]":333,"[Sc]":334,"[Yb]":335,"[15NH2]":336,"[Lu]":337,"[sH+]":338,"[Gd]":339,"[18F-]":340,"[SH3+]":341,"[SnH4]":342,"[TeH]":343,"[Si@@H]":344,"[Ga+3]":345,"[CaH2]":346,"[Tl]":347,"[Ta+5]":348,"[GeH]":349,"[Br+]":350,"[Sr]":351,"[Tl+3]":352,"[Sm+2]":353,"[PH5]":354,"%16":355,"[N@@+]":356,"[Au+3]":357,"[C-4]":358,"[Nd]":359,"[Ti+]":360,"[IH]":361,"[N@+]":362,"[125I]":363,"[Eu]":364,"[Sn+3]":365,"[Nb]":366,"[Er+3]":367,"[123I-]":368,"[14c]":369,"%17":370,"[SnH2]":371,"[YH]":372,"[Sb+5]":373,"[Pr+3]":374,"[Ir+]":375,"[N+3]":376,"[AlH2]":377,"[19F]":378,"%18":379,"[Tb]":380,"[14CH]":381,"[Mo+4]":382,"[Si+]":383,"[BH]":384,"[Be]":385,"[Rb]":386,"[pH]":387,"%19":388,"%20":389,"[Xe]":390,"[Ir-]":391,"[Be+2]":392,"[C+4]":393,"[RuH2]":394,"[15NH]":395,"[U+2]":396,"[Au-]":397,"%21":398,"%22":399,"[Au+]":400,"[15n]":401,"[Al+2]":402,"[Tb+3]":403,"[15N]":404,"[V+3]":405,"[W+6]":406,"[14CH3]":407,"[Cr+4]":408,"[ClH+]":409,"b":410,"[Ti+6]":411,"[Nd+]":412,"[Zr+]":413,"[PH2+]":414,"[Fm]":415,"[N@H+]":416,"[RuH]":417,"[Dy+3]":418,"%23":419,"[Hf+3]":420,"[W+4]":421,"[11C]":422,"[13CH]":423,"[Er]":424,"[124I]":425,"[LaH]":426,"[F]":427,"[siH]":428,"[Ga+]":429,"[Cm]":430,"[GeH3]":431,"[IH-]":432,"[U+6]":433,"[SeH+]":434,"[32P]":435,"[SeH-]":436,"[Pt-]":437,"[Ir+2]":438,"[se+]":439,"[U]":440,"[F+]":441,"[BH2]":442,"[As+]":443,"[Cf]":444,"[ClH2+]":445,"[Ni+]":446,"[TeH3]":447,"[SbH2]":448,"[Ag+3]":449,"%24":450,"[18O]":451,"[PH4]":452,"[Os+2]":453,"[Na-]":454,"[Sb+2]":455,"[V+4]":456,"[Ho+3]":457,"[68Ga]":458,"[PH-]":459,"[Bi+2]":460,"[Ce+2]":461,"[Pd+3]":462,"[99Tc]":463,"[13C@@H]":464,"[Fe+6]":465,"[c]":466,"[GeH2]":467,"[10B]":468,"[Cu+3]":469,"[Mo+2]":470,"[Cr+]":471,"[Pd+4]":472,"[Dy]":473,"[AsH]":474,"[Ba+]":475,"[SeH2]":476,"[In+]":477,"[TeH2]":478,"[BrH+]":479,"[14cH]":480,"[W+]":481,"[13C@H]":482,"[AsH2]":483,"[In+2]":484,"[N+2]":485,"[N@@H+]":486,"[SbH]":487,"[60Co]":488,"[AsH4+]":489,"[AsH3]":490,"[18OH]":491,"[Ru-2]":492,"[Na-2]":493,"[CuH2]":494,"[31P]":495,"[Ti+5]":496,"[35S]":497,"[P@@H]":498,"[ArH]":499,"[Co+]":500,"[Zr-2]":501,"[BH2-]":502,"[131I]":503,"[SH5]":504,"[VH]":505,"[B+2]":506,"[Yb+2]":507,"[14C@H]":508,"[211At]":509,"[NH3+2]":510,"[IrH]":511,"[IrH2]":512,"[Rh-]":513,"[Cr-]":514,"[Sb+]":515,"[Ni+3]":516,"[TaH3]":517,"[Tl+2]":518,"[64Cu]":519,"[Tc]":520,"[Cd+]":521,"[1H]":522,"[15nH]":523,"[AlH2+]":524,"[FH+2]":525,"[BiH3]":526,"[Ru-]":527,"[Mo+6]":528,"[AsH+]":529,"[BaH2]":530,"[BaH]":531,"[Fe+4]":532,"[229Th]":533,"[Th+4]":534,"[As+3]":535,"[NH+3]":536,"[P@H]":537,"[Li-]":538,"[7NaH]":539,"[Bi+]":540,"[PtH+2]":541,"[p-]":542,"[Re+5]":543,"[NiH]":544,"[Ni-]":545,"[Xe+]":546,"[Ca+]":547,"[11c]":548,"[Rh+4]":549,"[AcH]":550,"[HeH]":551,"[Sc+2]":552,"[Mn+]":553,"[UH]":554,"[14CH2]":555,"[SiH4+]":556,"[18OH2]":557,"[Ac-]":558,"[Re+4]":559,"[118Sn]":560,"[153Sm]":561,"[P+2]":562,"[9CH]":563,"[9CH3]":564,"[Y-]":565,"[NiH2]":566,"[Si+2]":567,"[Mn+6]":568,"[ZrH2]":569,"[C-2]":570,"[Bi+5]":571,"[24NaH]":572,"[Fr]":573,"[15CH]":574,"[Se+]":575,"[At]":576,"[P-3]":577,"[124I-]":578,"[CuH2-]":579,"[Nb+4]":580,"[Nb+3]":581,"[MgH]":582,"[Ir+4]":583,"[67Ga+3]":584,"[67Ga]":585,"[13N]":586,"[15OH2]":587,"[2NH]":588,"[Ho]":589,"[Cn]":590}
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.109,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "gradient_checkpointing": false,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.144,
12
+ "hidden_size": 384,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 464,
15
+ "is_gpu": true,
16
+ "layer_norm_eps": 1e-12,
17
+ "max_position_embeddings": 515,
18
+ "model_type": "roberta",
19
+ "num_attention_heads": 12,
20
+ "num_hidden_layers": 3,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.53.3",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 600
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.0.0",
5
+ "transformers": "4.53.3",
6
+ "pytorch": "2.5.0+cu118"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
eval/binary_classification_evaluation_val-sim_results.csv ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ epoch,steps,cosine_accuracy,cosine_accuracy_threshold,cosine_f1,cosine_precision,cosine_recall,cosine_f1_threshold,cosine_ap,cosine_mcc
2
+ 7.246376811594203,500,0.611,0.6980609418282548,0.5436893203883495,0.9748549323017408,0.6971622829878537,0.19032555952847827
3
+ 14.492753623188406,1000,0.62,0.6842105263157895,0.5253886010362694,0.9806576402321083,0.6917769112297409,0.08814553890796624
4
+ 21.73913043478261,1500,0.643,0.7009966777408638,0.6142649199417758,0.816247582205029,0.7147479159621501,0.28836569671243895
5
+ 28.985507246376812,2000,0.638,0.7198211624441131,0.5854545454545454,0.9342359767891683,0.723977839766718,0.2974345426612809
6
+ 36.231884057971016,2500,0.658,0.7080838323353292,0.5775335775335775,0.9148936170212766,0.7398787108823214,0.25767967733543545
7
+ 43.47826086956522,3000,0.671,0.7042889390519187,0.5763546798029556,0.9052224371373307,0.7370675686338501,0.24685118679448836
merges.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ #version: 0.2
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe61144fb1d4268afd35215ccc03497692fa64748cdd08ddb67b0b2c96d7b9c9
3
+ size 13715688
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,712 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 512,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": "BatchLongest",
11
+ "direction": "Right",
12
+ "pad_to_multiple_of": null,
13
+ "pad_id": 0,
14
+ "pad_type_id": 0,
15
+ "pad_token": "[PAD]"
16
+ },
17
+ "added_tokens": [
18
+ {
19
+ "id": 0,
20
+ "content": "[PAD]",
21
+ "single_word": false,
22
+ "lstrip": false,
23
+ "rstrip": false,
24
+ "normalized": false,
25
+ "special": true
26
+ },
27
+ {
28
+ "id": 11,
29
+ "content": "[UNK]",
30
+ "single_word": false,
31
+ "lstrip": false,
32
+ "rstrip": false,
33
+ "normalized": false,
34
+ "special": true
35
+ },
36
+ {
37
+ "id": 12,
38
+ "content": "[CLS]",
39
+ "single_word": false,
40
+ "lstrip": false,
41
+ "rstrip": false,
42
+ "normalized": false,
43
+ "special": true
44
+ },
45
+ {
46
+ "id": 13,
47
+ "content": "[SEP]",
48
+ "single_word": false,
49
+ "lstrip": false,
50
+ "rstrip": false,
51
+ "normalized": false,
52
+ "special": true
53
+ },
54
+ {
55
+ "id": 14,
56
+ "content": "[MASK]",
57
+ "single_word": false,
58
+ "lstrip": true,
59
+ "rstrip": false,
60
+ "normalized": false,
61
+ "special": true
62
+ },
63
+ {
64
+ "id": 591,
65
+ "content": "<s>",
66
+ "single_word": false,
67
+ "lstrip": false,
68
+ "rstrip": false,
69
+ "normalized": true,
70
+ "special": true
71
+ },
72
+ {
73
+ "id": 592,
74
+ "content": "</s>",
75
+ "single_word": false,
76
+ "lstrip": false,
77
+ "rstrip": false,
78
+ "normalized": true,
79
+ "special": true
80
+ }
81
+ ],
82
+ "normalizer": null,
83
+ "pre_tokenizer": {
84
+ "type": "ByteLevel",
85
+ "add_prefix_space": false,
86
+ "trim_offsets": true,
87
+ "use_regex": true
88
+ },
89
+ "post_processor": {
90
+ "type": "RobertaProcessing",
91
+ "sep": [
92
+ "[SEP]",
93
+ 13
94
+ ],
95
+ "cls": [
96
+ "[CLS]",
97
+ 12
98
+ ],
99
+ "trim_offsets": true,
100
+ "add_prefix_space": false
101
+ },
102
+ "decoder": {
103
+ "type": "ByteLevel",
104
+ "add_prefix_space": true,
105
+ "trim_offsets": true,
106
+ "use_regex": true
107
+ },
108
+ "model": {
109
+ "type": "BPE",
110
+ "dropout": null,
111
+ "unk_token": null,
112
+ "continuing_subword_prefix": "",
113
+ "end_of_word_suffix": "",
114
+ "fuse_unk": false,
115
+ "byte_fallback": false,
116
+ "ignore_merges": false,
117
+ "vocab": {
118
+ "[PAD]": 0,
119
+ "[unused1]": 1,
120
+ "[unused2]": 2,
121
+ "[unused3]": 3,
122
+ "[unused4]": 4,
123
+ "[unused5]": 5,
124
+ "[unused6]": 6,
125
+ "[unused7]": 7,
126
+ "[unused8]": 8,
127
+ "[unused9]": 9,
128
+ "[unused10]": 10,
129
+ "[UNK]": 11,
130
+ "[CLS]": 12,
131
+ "[SEP]": 13,
132
+ "[MASK]": 14,
133
+ "c": 15,
134
+ "C": 16,
135
+ "(": 17,
136
+ ")": 18,
137
+ "O": 19,
138
+ "1": 20,
139
+ "2": 21,
140
+ "=": 22,
141
+ "N": 23,
142
+ ".": 24,
143
+ "n": 25,
144
+ "3": 26,
145
+ "F": 27,
146
+ "Cl": 28,
147
+ ">>": 29,
148
+ "~": 30,
149
+ "-": 31,
150
+ "4": 32,
151
+ "[C@H]": 33,
152
+ "S": 34,
153
+ "[C@@H]": 35,
154
+ "[O-]": 36,
155
+ "Br": 37,
156
+ "#": 38,
157
+ "/": 39,
158
+ "[nH]": 40,
159
+ "[N+]": 41,
160
+ "s": 42,
161
+ "5": 43,
162
+ "o": 44,
163
+ "P": 45,
164
+ "[Na+]": 46,
165
+ "[Si]": 47,
166
+ "I": 48,
167
+ "[Na]": 49,
168
+ "[Pd]": 50,
169
+ "[K+]": 51,
170
+ "[K]": 52,
171
+ "[P]": 53,
172
+ "B": 54,
173
+ "[C@]": 55,
174
+ "[C@@]": 56,
175
+ "[Cl-]": 57,
176
+ "6": 58,
177
+ "[OH-]": 59,
178
+ "\\": 60,
179
+ "[N-]": 61,
180
+ "[Li]": 62,
181
+ "[H]": 63,
182
+ "[2H]": 64,
183
+ "[NH4+]": 65,
184
+ "[c-]": 66,
185
+ "[P-]": 67,
186
+ "[Cs+]": 68,
187
+ "[Li+]": 69,
188
+ "[Cs]": 70,
189
+ "[NaH]": 71,
190
+ "[H-]": 72,
191
+ "[O+]": 73,
192
+ "[BH4-]": 74,
193
+ "[Cu]": 75,
194
+ "7": 76,
195
+ "[Mg]": 77,
196
+ "[Fe+2]": 78,
197
+ "[n+]": 79,
198
+ "[Sn]": 80,
199
+ "[BH-]": 81,
200
+ "[Pd+2]": 82,
201
+ "[CH]": 83,
202
+ "[I-]": 84,
203
+ "[Br-]": 85,
204
+ "[C-]": 86,
205
+ "[Zn]": 87,
206
+ "[B-]": 88,
207
+ "[F-]": 89,
208
+ "[Al]": 90,
209
+ "[P+]": 91,
210
+ "[BH3-]": 92,
211
+ "[Fe]": 93,
212
+ "[C]": 94,
213
+ "[AlH4]": 95,
214
+ "[Ni]": 96,
215
+ "[SiH]": 97,
216
+ "8": 98,
217
+ "[Cu+2]": 99,
218
+ "[Mn]": 100,
219
+ "[AlH]": 101,
220
+ "[nH+]": 102,
221
+ "[AlH4-]": 103,
222
+ "[O-2]": 104,
223
+ "[Cr]": 105,
224
+ "[Mg+2]": 106,
225
+ "[NH3+]": 107,
226
+ "[S@]": 108,
227
+ "[Pt]": 109,
228
+ "[Al+3]": 110,
229
+ "[S@@]": 111,
230
+ "[S-]": 112,
231
+ "[Ti]": 113,
232
+ "[Zn+2]": 114,
233
+ "[PH]": 115,
234
+ "[NH2+]": 116,
235
+ "[Ru]": 117,
236
+ "[Ag+]": 118,
237
+ "[S+]": 119,
238
+ "[I+3]": 120,
239
+ "[NH+]": 121,
240
+ "[Ca+2]": 122,
241
+ "[Ag]": 123,
242
+ "9": 124,
243
+ "[Os]": 125,
244
+ "[Se]": 126,
245
+ "[SiH2]": 127,
246
+ "[Ca]": 128,
247
+ "[Ti+4]": 129,
248
+ "[Ac]": 130,
249
+ "[Cu+]": 131,
250
+ "[S]": 132,
251
+ "[Rh]": 133,
252
+ "[Cl+3]": 134,
253
+ "[cH-]": 135,
254
+ "[Zn+]": 136,
255
+ "[O]": 137,
256
+ "[Cl+]": 138,
257
+ "[SH]": 139,
258
+ "[H+]": 140,
259
+ "[Pd+]": 141,
260
+ "[se]": 142,
261
+ "[PH+]": 143,
262
+ "[I]": 144,
263
+ "[Pt+2]": 145,
264
+ "[C+]": 146,
265
+ "[Mg+]": 147,
266
+ "[Hg]": 148,
267
+ "[W]": 149,
268
+ "[SnH]": 150,
269
+ "[SiH3]": 151,
270
+ "[Fe+3]": 152,
271
+ "[NH]": 153,
272
+ "[Mo]": 154,
273
+ "[CH2+]": 155,
274
+ "%10": 156,
275
+ "[CH2-]": 157,
276
+ "[CH2]": 158,
277
+ "[n-]": 159,
278
+ "[Ce+4]": 160,
279
+ "[NH-]": 161,
280
+ "[Co]": 162,
281
+ "[I+]": 163,
282
+ "[PH2]": 164,
283
+ "[Pt+4]": 165,
284
+ "[Ce]": 166,
285
+ "[B]": 167,
286
+ "[Sn+2]": 168,
287
+ "[Ba+2]": 169,
288
+ "%11": 170,
289
+ "[Fe-3]": 171,
290
+ "[18F]": 172,
291
+ "[SH-]": 173,
292
+ "[Pb+2]": 174,
293
+ "[Os-2]": 175,
294
+ "[Zr+4]": 176,
295
+ "[N]": 177,
296
+ "[Ir]": 178,
297
+ "[Bi]": 179,
298
+ "[Ni+2]": 180,
299
+ "[P@]": 181,
300
+ "[Co+2]": 182,
301
+ "[s+]": 183,
302
+ "[As]": 184,
303
+ "[P+3]": 185,
304
+ "[Hg+2]": 186,
305
+ "[Yb+3]": 187,
306
+ "[CH-]": 188,
307
+ "[Zr+2]": 189,
308
+ "[Mn+2]": 190,
309
+ "[CH+]": 191,
310
+ "[In]": 192,
311
+ "[KH]": 193,
312
+ "[Ce+3]": 194,
313
+ "[Zr]": 195,
314
+ "[AlH2-]": 196,
315
+ "[OH2+]": 197,
316
+ "[Ti+3]": 198,
317
+ "[Rh+2]": 199,
318
+ "[Sb]": 200,
319
+ "[S-2]": 201,
320
+ "%12": 202,
321
+ "[P@@]": 203,
322
+ "[Si@H]": 204,
323
+ "[Mn+4]": 205,
324
+ "p": 206,
325
+ "[Ba]": 207,
326
+ "[NH2-]": 208,
327
+ "[Ge]": 209,
328
+ "[Pb+4]": 210,
329
+ "[Cr+3]": 211,
330
+ "[Au]": 212,
331
+ "[LiH]": 213,
332
+ "[Sc+3]": 214,
333
+ "[o+]": 215,
334
+ "[Rh-3]": 216,
335
+ "%13": 217,
336
+ "[Br]": 218,
337
+ "[Sb-]": 219,
338
+ "[S@+]": 220,
339
+ "[I+2]": 221,
340
+ "[Ar]": 222,
341
+ "[V]": 223,
342
+ "[Cu-]": 224,
343
+ "[Al-]": 225,
344
+ "[Te]": 226,
345
+ "[13c]": 227,
346
+ "[13C]": 228,
347
+ "[Cl]": 229,
348
+ "[PH4+]": 230,
349
+ "[SiH4]": 231,
350
+ "[te]": 232,
351
+ "[CH3-]": 233,
352
+ "[S@@+]": 234,
353
+ "[Rh+3]": 235,
354
+ "[SH+]": 236,
355
+ "[Bi+3]": 237,
356
+ "[Br+2]": 238,
357
+ "[La]": 239,
358
+ "[La+3]": 240,
359
+ "[Pt-2]": 241,
360
+ "[N@@]": 242,
361
+ "[PH3+]": 243,
362
+ "[N@]": 244,
363
+ "[Si+4]": 245,
364
+ "[Sr+2]": 246,
365
+ "[Al+]": 247,
366
+ "[Pb]": 248,
367
+ "[SeH]": 249,
368
+ "[Si-]": 250,
369
+ "[V+5]": 251,
370
+ "[Y+3]": 252,
371
+ "[Re]": 253,
372
+ "[Ru+]": 254,
373
+ "[Sm]": 255,
374
+ "*": 256,
375
+ "[3H]": 257,
376
+ "[NH2]": 258,
377
+ "[Ag-]": 259,
378
+ "[13CH3]": 260,
379
+ "[OH+]": 261,
380
+ "[Ru+3]": 262,
381
+ "[OH]": 263,
382
+ "[Gd+3]": 264,
383
+ "[13CH2]": 265,
384
+ "[In+3]": 266,
385
+ "[Si@@]": 267,
386
+ "[Si@]": 268,
387
+ "[Ti+2]": 269,
388
+ "[Sn+]": 270,
389
+ "[Cl+2]": 271,
390
+ "[AlH-]": 272,
391
+ "[Pd-2]": 273,
392
+ "[SnH3]": 274,
393
+ "[B+3]": 275,
394
+ "[Cu-2]": 276,
395
+ "[Nd+3]": 277,
396
+ "[Pb+3]": 278,
397
+ "[13cH]": 279,
398
+ "[Fe-4]": 280,
399
+ "[Ga]": 281,
400
+ "[Sn+4]": 282,
401
+ "[Hg+]": 283,
402
+ "[11CH3]": 284,
403
+ "[Hf]": 285,
404
+ "[Pr]": 286,
405
+ "[Y]": 287,
406
+ "[S+2]": 288,
407
+ "[Cd]": 289,
408
+ "[Cr+6]": 290,
409
+ "[Zr+3]": 291,
410
+ "[Rh+]": 292,
411
+ "[CH3]": 293,
412
+ "[N-3]": 294,
413
+ "[Hf+2]": 295,
414
+ "[Th]": 296,
415
+ "[Sb+3]": 297,
416
+ "%14": 298,
417
+ "[Cr+2]": 299,
418
+ "[Ru+2]": 300,
419
+ "[Hf+4]": 301,
420
+ "[14C]": 302,
421
+ "[Ta]": 303,
422
+ "[Tl+]": 304,
423
+ "[B+]": 305,
424
+ "[Os+4]": 306,
425
+ "[PdH2]": 307,
426
+ "[Pd-]": 308,
427
+ "[Cd+2]": 309,
428
+ "[Co+3]": 310,
429
+ "[S+4]": 311,
430
+ "[Nb+5]": 312,
431
+ "[123I]": 313,
432
+ "[c+]": 314,
433
+ "[Rb+]": 315,
434
+ "[V+2]": 316,
435
+ "[CH3+]": 317,
436
+ "[Ag+2]": 318,
437
+ "[cH+]": 319,
438
+ "[Mn+3]": 320,
439
+ "[Se-]": 321,
440
+ "[As-]": 322,
441
+ "[Eu+3]": 323,
442
+ "[SH2]": 324,
443
+ "[Sm+3]": 325,
444
+ "[IH+]": 326,
445
+ "%15": 327,
446
+ "[OH3+]": 328,
447
+ "[PH3]": 329,
448
+ "[IH2+]": 330,
449
+ "[SH2+]": 331,
450
+ "[Ir+3]": 332,
451
+ "[AlH3]": 333,
452
+ "[Sc]": 334,
453
+ "[Yb]": 335,
454
+ "[15NH2]": 336,
455
+ "[Lu]": 337,
456
+ "[sH+]": 338,
457
+ "[Gd]": 339,
458
+ "[18F-]": 340,
459
+ "[SH3+]": 341,
460
+ "[SnH4]": 342,
461
+ "[TeH]": 343,
462
+ "[Si@@H]": 344,
463
+ "[Ga+3]": 345,
464
+ "[CaH2]": 346,
465
+ "[Tl]": 347,
466
+ "[Ta+5]": 348,
467
+ "[GeH]": 349,
468
+ "[Br+]": 350,
469
+ "[Sr]": 351,
470
+ "[Tl+3]": 352,
471
+ "[Sm+2]": 353,
472
+ "[PH5]": 354,
473
+ "%16": 355,
474
+ "[N@@+]": 356,
475
+ "[Au+3]": 357,
476
+ "[C-4]": 358,
477
+ "[Nd]": 359,
478
+ "[Ti+]": 360,
479
+ "[IH]": 361,
480
+ "[N@+]": 362,
481
+ "[125I]": 363,
482
+ "[Eu]": 364,
483
+ "[Sn+3]": 365,
484
+ "[Nb]": 366,
485
+ "[Er+3]": 367,
486
+ "[123I-]": 368,
487
+ "[14c]": 369,
488
+ "%17": 370,
489
+ "[SnH2]": 371,
490
+ "[YH]": 372,
491
+ "[Sb+5]": 373,
492
+ "[Pr+3]": 374,
493
+ "[Ir+]": 375,
494
+ "[N+3]": 376,
495
+ "[AlH2]": 377,
496
+ "[19F]": 378,
497
+ "%18": 379,
498
+ "[Tb]": 380,
499
+ "[14CH]": 381,
500
+ "[Mo+4]": 382,
501
+ "[Si+]": 383,
502
+ "[BH]": 384,
503
+ "[Be]": 385,
504
+ "[Rb]": 386,
505
+ "[pH]": 387,
506
+ "%19": 388,
507
+ "%20": 389,
508
+ "[Xe]": 390,
509
+ "[Ir-]": 391,
510
+ "[Be+2]": 392,
511
+ "[C+4]": 393,
512
+ "[RuH2]": 394,
513
+ "[15NH]": 395,
514
+ "[U+2]": 396,
515
+ "[Au-]": 397,
516
+ "%21": 398,
517
+ "%22": 399,
518
+ "[Au+]": 400,
519
+ "[15n]": 401,
520
+ "[Al+2]": 402,
521
+ "[Tb+3]": 403,
522
+ "[15N]": 404,
523
+ "[V+3]": 405,
524
+ "[W+6]": 406,
525
+ "[14CH3]": 407,
526
+ "[Cr+4]": 408,
527
+ "[ClH+]": 409,
528
+ "b": 410,
529
+ "[Ti+6]": 411,
530
+ "[Nd+]": 412,
531
+ "[Zr+]": 413,
532
+ "[PH2+]": 414,
533
+ "[Fm]": 415,
534
+ "[N@H+]": 416,
535
+ "[RuH]": 417,
536
+ "[Dy+3]": 418,
537
+ "%23": 419,
538
+ "[Hf+3]": 420,
539
+ "[W+4]": 421,
540
+ "[11C]": 422,
541
+ "[13CH]": 423,
542
+ "[Er]": 424,
543
+ "[124I]": 425,
544
+ "[LaH]": 426,
545
+ "[F]": 427,
546
+ "[siH]": 428,
547
+ "[Ga+]": 429,
548
+ "[Cm]": 430,
549
+ "[GeH3]": 431,
550
+ "[IH-]": 432,
551
+ "[U+6]": 433,
552
+ "[SeH+]": 434,
553
+ "[32P]": 435,
554
+ "[SeH-]": 436,
555
+ "[Pt-]": 437,
556
+ "[Ir+2]": 438,
557
+ "[se+]": 439,
558
+ "[U]": 440,
559
+ "[F+]": 441,
560
+ "[BH2]": 442,
561
+ "[As+]": 443,
562
+ "[Cf]": 444,
563
+ "[ClH2+]": 445,
564
+ "[Ni+]": 446,
565
+ "[TeH3]": 447,
566
+ "[SbH2]": 448,
567
+ "[Ag+3]": 449,
568
+ "%24": 450,
569
+ "[18O]": 451,
570
+ "[PH4]": 452,
571
+ "[Os+2]": 453,
572
+ "[Na-]": 454,
573
+ "[Sb+2]": 455,
574
+ "[V+4]": 456,
575
+ "[Ho+3]": 457,
576
+ "[68Ga]": 458,
577
+ "[PH-]": 459,
578
+ "[Bi+2]": 460,
579
+ "[Ce+2]": 461,
580
+ "[Pd+3]": 462,
581
+ "[99Tc]": 463,
582
+ "[13C@@H]": 464,
583
+ "[Fe+6]": 465,
584
+ "[c]": 466,
585
+ "[GeH2]": 467,
586
+ "[10B]": 468,
587
+ "[Cu+3]": 469,
588
+ "[Mo+2]": 470,
589
+ "[Cr+]": 471,
590
+ "[Pd+4]": 472,
591
+ "[Dy]": 473,
592
+ "[AsH]": 474,
593
+ "[Ba+]": 475,
594
+ "[SeH2]": 476,
595
+ "[In+]": 477,
596
+ "[TeH2]": 478,
597
+ "[BrH+]": 479,
598
+ "[14cH]": 480,
599
+ "[W+]": 481,
600
+ "[13C@H]": 482,
601
+ "[AsH2]": 483,
602
+ "[In+2]": 484,
603
+ "[N+2]": 485,
604
+ "[N@@H+]": 486,
605
+ "[SbH]": 487,
606
+ "[60Co]": 488,
607
+ "[AsH4+]": 489,
608
+ "[AsH3]": 490,
609
+ "[18OH]": 491,
610
+ "[Ru-2]": 492,
611
+ "[Na-2]": 493,
612
+ "[CuH2]": 494,
613
+ "[31P]": 495,
614
+ "[Ti+5]": 496,
615
+ "[35S]": 497,
616
+ "[P@@H]": 498,
617
+ "[ArH]": 499,
618
+ "[Co+]": 500,
619
+ "[Zr-2]": 501,
620
+ "[BH2-]": 502,
621
+ "[131I]": 503,
622
+ "[SH5]": 504,
623
+ "[VH]": 505,
624
+ "[B+2]": 506,
625
+ "[Yb+2]": 507,
626
+ "[14C@H]": 508,
627
+ "[211At]": 509,
628
+ "[NH3+2]": 510,
629
+ "[IrH]": 511,
630
+ "[IrH2]": 512,
631
+ "[Rh-]": 513,
632
+ "[Cr-]": 514,
633
+ "[Sb+]": 515,
634
+ "[Ni+3]": 516,
635
+ "[TaH3]": 517,
636
+ "[Tl+2]": 518,
637
+ "[64Cu]": 519,
638
+ "[Tc]": 520,
639
+ "[Cd+]": 521,
640
+ "[1H]": 522,
641
+ "[15nH]": 523,
642
+ "[AlH2+]": 524,
643
+ "[FH+2]": 525,
644
+ "[BiH3]": 526,
645
+ "[Ru-]": 527,
646
+ "[Mo+6]": 528,
647
+ "[AsH+]": 529,
648
+ "[BaH2]": 530,
649
+ "[BaH]": 531,
650
+ "[Fe+4]": 532,
651
+ "[229Th]": 533,
652
+ "[Th+4]": 534,
653
+ "[As+3]": 535,
654
+ "[NH+3]": 536,
655
+ "[P@H]": 537,
656
+ "[Li-]": 538,
657
+ "[7NaH]": 539,
658
+ "[Bi+]": 540,
659
+ "[PtH+2]": 541,
660
+ "[p-]": 542,
661
+ "[Re+5]": 543,
662
+ "[NiH]": 544,
663
+ "[Ni-]": 545,
664
+ "[Xe+]": 546,
665
+ "[Ca+]": 547,
666
+ "[11c]": 548,
667
+ "[Rh+4]": 549,
668
+ "[AcH]": 550,
669
+ "[HeH]": 551,
670
+ "[Sc+2]": 552,
671
+ "[Mn+]": 553,
672
+ "[UH]": 554,
673
+ "[14CH2]": 555,
674
+ "[SiH4+]": 556,
675
+ "[18OH2]": 557,
676
+ "[Ac-]": 558,
677
+ "[Re+4]": 559,
678
+ "[118Sn]": 560,
679
+ "[153Sm]": 561,
680
+ "[P+2]": 562,
681
+ "[9CH]": 563,
682
+ "[9CH3]": 564,
683
+ "[Y-]": 565,
684
+ "[NiH2]": 566,
685
+ "[Si+2]": 567,
686
+ "[Mn+6]": 568,
687
+ "[ZrH2]": 569,
688
+ "[C-2]": 570,
689
+ "[Bi+5]": 571,
690
+ "[24NaH]": 572,
691
+ "[Fr]": 573,
692
+ "[15CH]": 574,
693
+ "[Se+]": 575,
694
+ "[At]": 576,
695
+ "[P-3]": 577,
696
+ "[124I-]": 578,
697
+ "[CuH2-]": 579,
698
+ "[Nb+4]": 580,
699
+ "[Nb+3]": 581,
700
+ "[MgH]": 582,
701
+ "[Ir+4]": 583,
702
+ "[67Ga+3]": 584,
703
+ "[67Ga]": 585,
704
+ "[13N]": 586,
705
+ "[15OH2]": 587,
706
+ "[2NH]": 588,
707
+ "[Ho]": 589,
708
+ "[Cn]": 590
709
+ },
710
+ "merges": []
711
+ }
712
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "11": {
13
+ "content": "[UNK]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "12": {
21
+ "content": "[CLS]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "13": {
29
+ "content": "[SEP]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "14": {
37
+ "content": "[MASK]",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "591": {
45
+ "content": "<s>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "592": {
53
+ "content": "</s>",
54
+ "lstrip": false,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
+ }
60
+ },
61
+ "bos_token": "<s>",
62
+ "clean_up_tokenization_spaces": false,
63
+ "cls_token": "[CLS]",
64
+ "eos_token": "</s>",
65
+ "errors": "replace",
66
+ "extra_special_tokens": {},
67
+ "full_tokenizer_file": null,
68
+ "mask_token": "[MASK]",
69
+ "max_len": 512,
70
+ "model_max_length": 512,
71
+ "pad_token": "[PAD]",
72
+ "sep_token": "[SEP]",
73
+ "tokenizer_class": "RobertaTokenizer",
74
+ "trim_offsets": true,
75
+ "unk_token": "[UNK]"
76
+ }
vocab.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"[PAD]":0,"[unused1]":1,"[unused2]":2,"[unused3]":3,"[unused4]":4,"[unused5]":5,"[unused6]":6,"[unused7]":7,"[unused8]":8,"[unused9]":9,"[unused10]":10,"[UNK]":11,"[CLS]":12,"[SEP]":13,"[MASK]":14,"c":15,"C":16,"(":17,")":18,"O":19,"1":20,"2":21,"=":22,"N":23,".":24,"n":25,"3":26,"F":27,"Cl":28,">>":29,"~":30,"-":31,"4":32,"[C@H]":33,"S":34,"[C@@H]":35,"[O-]":36,"Br":37,"#":38,"/":39,"[nH]":40,"[N+]":41,"s":42,"5":43,"o":44,"P":45,"[Na+]":46,"[Si]":47,"I":48,"[Na]":49,"[Pd]":50,"[K+]":51,"[K]":52,"[P]":53,"B":54,"[C@]":55,"[C@@]":56,"[Cl-]":57,"6":58,"[OH-]":59,"\\":60,"[N-]":61,"[Li]":62,"[H]":63,"[2H]":64,"[NH4+]":65,"[c-]":66,"[P-]":67,"[Cs+]":68,"[Li+]":69,"[Cs]":70,"[NaH]":71,"[H-]":72,"[O+]":73,"[BH4-]":74,"[Cu]":75,"7":76,"[Mg]":77,"[Fe+2]":78,"[n+]":79,"[Sn]":80,"[BH-]":81,"[Pd+2]":82,"[CH]":83,"[I-]":84,"[Br-]":85,"[C-]":86,"[Zn]":87,"[B-]":88,"[F-]":89,"[Al]":90,"[P+]":91,"[BH3-]":92,"[Fe]":93,"[C]":94,"[AlH4]":95,"[Ni]":96,"[SiH]":97,"8":98,"[Cu+2]":99,"[Mn]":100,"[AlH]":101,"[nH+]":102,"[AlH4-]":103,"[O-2]":104,"[Cr]":105,"[Mg+2]":106,"[NH3+]":107,"[S@]":108,"[Pt]":109,"[Al+3]":110,"[S@@]":111,"[S-]":112,"[Ti]":113,"[Zn+2]":114,"[PH]":115,"[NH2+]":116,"[Ru]":117,"[Ag+]":118,"[S+]":119,"[I+3]":120,"[NH+]":121,"[Ca+2]":122,"[Ag]":123,"9":124,"[Os]":125,"[Se]":126,"[SiH2]":127,"[Ca]":128,"[Ti+4]":129,"[Ac]":130,"[Cu+]":131,"[S]":132,"[Rh]":133,"[Cl+3]":134,"[cH-]":135,"[Zn+]":136,"[O]":137,"[Cl+]":138,"[SH]":139,"[H+]":140,"[Pd+]":141,"[se]":142,"[PH+]":143,"[I]":144,"[Pt+2]":145,"[C+]":146,"[Mg+]":147,"[Hg]":148,"[W]":149,"[SnH]":150,"[SiH3]":151,"[Fe+3]":152,"[NH]":153,"[Mo]":154,"[CH2+]":155,"%10":156,"[CH2-]":157,"[CH2]":158,"[n-]":159,"[Ce+4]":160,"[NH-]":161,"[Co]":162,"[I+]":163,"[PH2]":164,"[Pt+4]":165,"[Ce]":166,"[B]":167,"[Sn+2]":168,"[Ba+2]":169,"%11":170,"[Fe-3]":171,"[18F]":172,"[SH-]":173,"[Pb+2]":174,"[Os-2]":175,"[Zr+4]":176,"[N]":177,"[Ir]":178,"[Bi]":179,"[Ni+2]":180,"[P@]":181,"[Co+2]":182,"[s+]":183,"[As]":184,"[P+3]":185,"[Hg+2]":186,"[Yb+3]":187,"[CH-]":188,"[Zr+2]":189,"[Mn+2]":190,"[CH+]":191,"[In]":192,"[KH]":193,"[Ce+3]":194,"[Zr]":195,"[AlH2-]":196,"[OH2+]":197,"[Ti+3]":198,"[Rh+2]":199,"[Sb]":200,"[S-2]":201,"%12":202,"[P@@]":203,"[Si@H]":204,"[Mn+4]":205,"p":206,"[Ba]":207,"[NH2-]":208,"[Ge]":209,"[Pb+4]":210,"[Cr+3]":211,"[Au]":212,"[LiH]":213,"[Sc+3]":214,"[o+]":215,"[Rh-3]":216,"%13":217,"[Br]":218,"[Sb-]":219,"[S@+]":220,"[I+2]":221,"[Ar]":222,"[V]":223,"[Cu-]":224,"[Al-]":225,"[Te]":226,"[13c]":227,"[13C]":228,"[Cl]":229,"[PH4+]":230,"[SiH4]":231,"[te]":232,"[CH3-]":233,"[S@@+]":234,"[Rh+3]":235,"[SH+]":236,"[Bi+3]":237,"[Br+2]":238,"[La]":239,"[La+3]":240,"[Pt-2]":241,"[N@@]":242,"[PH3+]":243,"[N@]":244,"[Si+4]":245,"[Sr+2]":246,"[Al+]":247,"[Pb]":248,"[SeH]":249,"[Si-]":250,"[V+5]":251,"[Y+3]":252,"[Re]":253,"[Ru+]":254,"[Sm]":255,"*":256,"[3H]":257,"[NH2]":258,"[Ag-]":259,"[13CH3]":260,"[OH+]":261,"[Ru+3]":262,"[OH]":263,"[Gd+3]":264,"[13CH2]":265,"[In+3]":266,"[Si@@]":267,"[Si@]":268,"[Ti+2]":269,"[Sn+]":270,"[Cl+2]":271,"[AlH-]":272,"[Pd-2]":273,"[SnH3]":274,"[B+3]":275,"[Cu-2]":276,"[Nd+3]":277,"[Pb+3]":278,"[13cH]":279,"[Fe-4]":280,"[Ga]":281,"[Sn+4]":282,"[Hg+]":283,"[11CH3]":284,"[Hf]":285,"[Pr]":286,"[Y]":287,"[S+2]":288,"[Cd]":289,"[Cr+6]":290,"[Zr+3]":291,"[Rh+]":292,"[CH3]":293,"[N-3]":294,"[Hf+2]":295,"[Th]":296,"[Sb+3]":297,"%14":298,"[Cr+2]":299,"[Ru+2]":300,"[Hf+4]":301,"[14C]":302,"[Ta]":303,"[Tl+]":304,"[B+]":305,"[Os+4]":306,"[PdH2]":307,"[Pd-]":308,"[Cd+2]":309,"[Co+3]":310,"[S+4]":311,"[Nb+5]":312,"[123I]":313,"[c+]":314,"[Rb+]":315,"[V+2]":316,"[CH3+]":317,"[Ag+2]":318,"[cH+]":319,"[Mn+3]":320,"[Se-]":321,"[As-]":322,"[Eu+3]":323,"[SH2]":324,"[Sm+3]":325,"[IH+]":326,"%15":327,"[OH3+]":328,"[PH3]":329,"[IH2+]":330,"[SH2+]":331,"[Ir+3]":332,"[AlH3]":333,"[Sc]":334,"[Yb]":335,"[15NH2]":336,"[Lu]":337,"[sH+]":338,"[Gd]":339,"[18F-]":340,"[SH3+]":341,"[SnH4]":342,"[TeH]":343,"[Si@@H]":344,"[Ga+3]":345,"[CaH2]":346,"[Tl]":347,"[Ta+5]":348,"[GeH]":349,"[Br+]":350,"[Sr]":351,"[Tl+3]":352,"[Sm+2]":353,"[PH5]":354,"%16":355,"[N@@+]":356,"[Au+3]":357,"[C-4]":358,"[Nd]":359,"[Ti+]":360,"[IH]":361,"[N@+]":362,"[125I]":363,"[Eu]":364,"[Sn+3]":365,"[Nb]":366,"[Er+3]":367,"[123I-]":368,"[14c]":369,"%17":370,"[SnH2]":371,"[YH]":372,"[Sb+5]":373,"[Pr+3]":374,"[Ir+]":375,"[N+3]":376,"[AlH2]":377,"[19F]":378,"%18":379,"[Tb]":380,"[14CH]":381,"[Mo+4]":382,"[Si+]":383,"[BH]":384,"[Be]":385,"[Rb]":386,"[pH]":387,"%19":388,"%20":389,"[Xe]":390,"[Ir-]":391,"[Be+2]":392,"[C+4]":393,"[RuH2]":394,"[15NH]":395,"[U+2]":396,"[Au-]":397,"%21":398,"%22":399,"[Au+]":400,"[15n]":401,"[Al+2]":402,"[Tb+3]":403,"[15N]":404,"[V+3]":405,"[W+6]":406,"[14CH3]":407,"[Cr+4]":408,"[ClH+]":409,"b":410,"[Ti+6]":411,"[Nd+]":412,"[Zr+]":413,"[PH2+]":414,"[Fm]":415,"[N@H+]":416,"[RuH]":417,"[Dy+3]":418,"%23":419,"[Hf+3]":420,"[W+4]":421,"[11C]":422,"[13CH]":423,"[Er]":424,"[124I]":425,"[LaH]":426,"[F]":427,"[siH]":428,"[Ga+]":429,"[Cm]":430,"[GeH3]":431,"[IH-]":432,"[U+6]":433,"[SeH+]":434,"[32P]":435,"[SeH-]":436,"[Pt-]":437,"[Ir+2]":438,"[se+]":439,"[U]":440,"[F+]":441,"[BH2]":442,"[As+]":443,"[Cf]":444,"[ClH2+]":445,"[Ni+]":446,"[TeH3]":447,"[SbH2]":448,"[Ag+3]":449,"%24":450,"[18O]":451,"[PH4]":452,"[Os+2]":453,"[Na-]":454,"[Sb+2]":455,"[V+4]":456,"[Ho+3]":457,"[68Ga]":458,"[PH-]":459,"[Bi+2]":460,"[Ce+2]":461,"[Pd+3]":462,"[99Tc]":463,"[13C@@H]":464,"[Fe+6]":465,"[c]":466,"[GeH2]":467,"[10B]":468,"[Cu+3]":469,"[Mo+2]":470,"[Cr+]":471,"[Pd+4]":472,"[Dy]":473,"[AsH]":474,"[Ba+]":475,"[SeH2]":476,"[In+]":477,"[TeH2]":478,"[BrH+]":479,"[14cH]":480,"[W+]":481,"[13C@H]":482,"[AsH2]":483,"[In+2]":484,"[N+2]":485,"[N@@H+]":486,"[SbH]":487,"[60Co]":488,"[AsH4+]":489,"[AsH3]":490,"[18OH]":491,"[Ru-2]":492,"[Na-2]":493,"[CuH2]":494,"[31P]":495,"[Ti+5]":496,"[35S]":497,"[P@@H]":498,"[ArH]":499,"[Co+]":500,"[Zr-2]":501,"[BH2-]":502,"[131I]":503,"[SH5]":504,"[VH]":505,"[B+2]":506,"[Yb+2]":507,"[14C@H]":508,"[211At]":509,"[NH3+2]":510,"[IrH]":511,"[IrH2]":512,"[Rh-]":513,"[Cr-]":514,"[Sb+]":515,"[Ni+3]":516,"[TaH3]":517,"[Tl+2]":518,"[64Cu]":519,"[Tc]":520,"[Cd+]":521,"[1H]":522,"[15nH]":523,"[AlH2+]":524,"[FH+2]":525,"[BiH3]":526,"[Ru-]":527,"[Mo+6]":528,"[AsH+]":529,"[BaH2]":530,"[BaH]":531,"[Fe+4]":532,"[229Th]":533,"[Th+4]":534,"[As+3]":535,"[NH+3]":536,"[P@H]":537,"[Li-]":538,"[7NaH]":539,"[Bi+]":540,"[PtH+2]":541,"[p-]":542,"[Re+5]":543,"[NiH]":544,"[Ni-]":545,"[Xe+]":546,"[Ca+]":547,"[11c]":548,"[Rh+4]":549,"[AcH]":550,"[HeH]":551,"[Sc+2]":552,"[Mn+]":553,"[UH]":554,"[14CH2]":555,"[SiH4+]":556,"[18OH2]":557,"[Ac-]":558,"[Re+4]":559,"[118Sn]":560,"[153Sm]":561,"[P+2]":562,"[9CH]":563,"[9CH3]":564,"[Y-]":565,"[NiH2]":566,"[Si+2]":567,"[Mn+6]":568,"[ZrH2]":569,"[C-2]":570,"[Bi+5]":571,"[24NaH]":572,"[Fr]":573,"[15CH]":574,"[Se+]":575,"[At]":576,"[P-3]":577,"[124I-]":578,"[CuH2-]":579,"[Nb+4]":580,"[Nb+3]":581,"[MgH]":582,"[Ir+4]":583,"[67Ga+3]":584,"[67Ga]":585,"[13N]":586,"[15OH2]":587,"[2NH]":588,"[Ho]":589,"[Cn]":590}