ctheodoris/Geneformer · Cell_inds_to_perturb to control the range of cells

Apr 27

Hi,

I am currently using cell_inds_to_perturb to control the range of cells being perturbed in my InSilicoPerturber experiments. I would like to confirm a few points regarding reproducibility and usage:

①
If I specify the same start and end indices across different runs, will the actual data corresponding to those indices always remain exactly the same?
In other words, is the dataset fully deterministic once created, without any internal reshuffling or randomness that could cause the data content at a given index to change between runs?

②
My current strategy is to divide the dataset across multiple GPUs manually. For example:

For logical GPU 0: start=0, end=1000

For logical GPU 1: start=1000, end=2000

etc.
And then submit multiple jobs where each job sets cell_inds_to_perturb={"start": start, "end": end} accordingly.

I would like to confirm if this approach is correct for splitting perturbation tasks across GPUs.

Thank you very much for your help and clarification!

ctheodoris

Owner Apr 28

Thanks for your question. That is the intended use case. If you change something, like the filtering argument, then it will change the cells present. Otherwise, if all else is the same, it should be deterministic.

ctheodoris changed discussion status to closed Apr 28