entropy commited on
Commit
f42a5a1
1 Parent(s): 988df42

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -52,6 +52,17 @@ mask = inputs['attention_mask']
52
  embeddings = ((full_embeddings * mask.unsqueeze(-1)).sum(1) / mask.sum(-1).unsqueeze(-1))
53
  ```
54
 
 
 
 
 
 
 
 
 
 
 
 
55
  ## Model Performance
56
 
57
  To test generation performance, 1m compounds were generated at various temperature values. Generated compounds were checked for uniqueness and structural validity.
 
52
  embeddings = ((full_embeddings * mask.unsqueeze(-1)).sum(1) / mask.sum(-1).unsqueeze(-1))
53
  ```
54
 
55
+ ### WARNING
56
+
57
+ This model was trained with `bos` and `eos` tokens around SMILES inputs. The `GPT2TokenizerFast` tokenizer DOES NOT ADD special tokens,
58
+ even when `add_special_tokens=True`. Huggingface says this is [intended behavior](https://github.com/huggingface/transformers/issues/3311#issuecomment-693719190).
59
+
60
+ It may be necessary to manually add these tokens
61
+
62
+ ```python
63
+ inputs = collator(tokenizer([tokenizer.bos_token+i+tokenizer.eos_token for i in smiles]))
64
+ ```
65
+
66
  ## Model Performance
67
 
68
  To test generation performance, 1m compounds were generated at various temperature values. Generated compounds were checked for uniqueness and structural validity.