UdS-LSV/da4mt-mlm-60 · Hugging Face

This repository contains the model weights of the BERT model trained using masked language modelling on 60% of the GuacaMol dataset. Further information can be found in our publication.

from transformers import AutoModel, AutoTokenizer

mols = [
  "CCOc1cc2nn(CCC(C)(C)O)cc2cc1NC(=O)c1cccc(C(F)F)n1",
  "CN(c1ncc(F)cn1)[C@H]1CCCNC1",
  "CC(C)(Oc1ccc(-c2cnc(N)c(-c3ccc(Cl)cc3)c2)cc1)C(=O)O",
  "CC(C)(O)CCn1cc2cc(NC(=O)c3cccc(C(F)(F)F)n3)c(C(C)(C)O)cc2n1",
  # ...
]


tokenizer = AutoTokenizer.from_pretrained("UdS-LSV/da4mt-mlm-60")
model = AutoModel.from_pretrained("UdS-LSV/da4mt-mlm-60")

inputs = tokenizer(mols, add_special_tokens=True, truncation=True, max_length=128, padding="max_length", return_tensors="pt")
embeddings = model(**inputs).last_hidden_state[:, 0, :]

UdS-LSV
/

da4mt-mlm-60

See also

Collection including UdS-LSV/da4mt-mlm-60

domain-adaptation-molecular-transformers