about num_classes of emb_extractor
Hi, thanks for the great tool!
If I have finetuned geneformer use this set:
model = CustomSequenceClassification.from_pretrained(pretrain_model_path, num_labels=1, problem_type="regression",...).to("cuda")
i.e. I set num_labels=1, because this is a regression scenario.
When I extract embeddings using emb_extractor based on the finetuned model, how to set the num_classes parameter?
Setting num_classes=1 is OK?
Thanks for your question! You would need to check if the model loads properly as one of the options we have, such as CellClassification (sequence classification). If not you should load as you have above and extract embedding by running a forward pass through the model. Alternatively you could save only the trunk of the model and load it as Pretrained.
Thanks for your question! You would need to check if the model loads properly as one of the options we have, such as CellClassification (sequence classification). If not you should load as you have above and extract embedding by running a forward pass through the model. Alternatively you could save only the trunk of the model and load it as Pretrained.
Thanks for your reply!
So if the model loads successfully through EmbExtractor when setting num_classes=1, this way of extraction is feasible, right?
I would suggest loading it separately yourself to confirm that it was loaded properly since it is not one of the model types explicitly provided in the function to load the model on this repository. This way you can ensure it doesn’t load a randomly initialized head on top of your model, or if it does that you know how to adjust your layer selection to not choose this layer for extraction.