Model results

#1
by dzyla - opened

Hi,

Thanks for sharing the model! I have tried it on some protein of interest, and unfortunately, the results are ~50% accurate. Also, the model performs much lower in methods such as FoldX or Pythia. It would be great to have an ESM3-based model for the prediction of ddG, but I think this one is not there yet. Would you include the instructions for fine-tuning it for the target proteins? Do you know if the ESM3 is structure-sensitive, as the embeddings also contain the mutated residue's structural information and neighbors?

Thanks!

Hi,
It's expected to perform poorly, the dataset that it was trained on contains few examples. I collected more datasets with more examples, but unfortunately, I did not have enough time to train the model on them.

Here are the datasets:
https://huggingface.co/datasets/hazemessam/abyssal_db
https://huggingface.co/datasets/hazemessam/ddg_megadataset (largest dataset)
https://huggingface.co/datasets/hazemessam/prostata
https://huggingface.co/datasets/hazemessam/ddg (this was the one I used to train the model, the model was only trained on ssym.csv)
https://huggingface.co/datasets/hazemessam/fireprot_db

If you would like to check the training script: https://github.com/hazemessamm/silica/blob/main/silica/stability_training.py

Note: If you are going to train the model on ddg_megadataset (or any dataset that already swaps the sequences), make sure to turn off the swap parameter in the SingleMutationDatasetV2 class because the examples in this dataset already swaps the examples.

Sorry for the late reply, and I hope I was able to explain everything clearly, if not let me know.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment