tutorial-seq-fitness / sources /read-counts-expansion.md
Eachan Johnson
Initial commit
997150d

A newer version of the Gradio SDK is available: 5.42.0

Upgrade

Accounting for sequencing subsampling per sample

Each sequencing sample $s$ could be over- or under-sampling the population relative to the first timepoint by some factor $\phi_s$.

logโกci(t)ci(0)=logโกฯ•sni(t)ni(0)=logโกฯ•s+wiwwtlogโกnwt(t)nwt(0)\log \frac{c_i(t)}{c_i(0)} = \log \phi_s\frac{n_i(t)}{n_i(0)} = \log \phi_s + \frac{w_i}{w_{wt}} \log \frac{n_{wt}(t)}{n_{wt}(0)}

Variables:

  • $c_i(t)$: Read (or UMI) count of strain $i$ at time $t$
  • $\phi_s$: The ratio of sampling depth at time $t$ to that at time $0$ for sample $s$

The factor $\phi_s$ is the ratio of the ratio of read counts between samples and the ratio of cell counts between samples for any strain (assuming each strain is sampled without bias):

logโกฯ•s=logโกci(t)ci(0)โˆ’logโกni(0)ni(0)\log \phi_s = \log \frac{c_i(t)}{c_i(0)} - \log \frac{n_i(0)}{n_i(0)}

We can get rid of the nuisance parameter $\phi_s$ (which is difficult to measure becuase we don't know the true number of cells for each strain and sample) using the following trick.

We have the equation for read counts for mutant $i$ (same as above):

logโกci(t)ci(0)=logโกฯ•s+wiwwtlogโกnwt(t)nwt(0) \log \frac{c_i(t)}{c_i(0)} = \log \phi_s + \frac{w_i}{w_{wt}} \log \frac{n_{wt}(t)}{n_{wt}(0)}

And for the reference strain (relative fitness is 1):

logโกcwt(t)cwt(0)=logโกฯ•s+logโกnwt(t)nwt(0) \log \frac{c_{wt}(t)}{c_{wt}(0)} = \log \phi_s + \log \frac{n_{wt}(t)}{n_{wt}(0)}

We can make $\phi_s$ disappear by taking the difference:

logโกci(t)ci(0)โˆ’logโกcwt(t)cwt(0)=wiwwtlogโกnwt(t)nwt(0)โˆ’logโกnwt(t)nwt(0) \log \frac{c_i(t)}{c_i(0)} - \log \frac{c_{wt}(t)}{c_{wt}(0)} = \frac{w_i}{w_{wt}} \log \frac{n_{wt}(t)}{n_{wt}(0)} - \log \frac{n_{wt}(t)}{n_{wt}(0)}

This is equivalent to:

logโก(ci(t)cwt(t)cwt(0)ci(0))=(wiwwtโˆ’1)logโกnwt(t)nwt(0) \log \left( \frac{c_i(t)}{c_{wt}(t)}\frac{c_{wt}(0)}{c_i(0)} \right) = \left(\frac{w_i}{w_{wt}} - 1 \right) \log \frac{n_{wt}(t)}{n_{wt}(0)}

So the ratio of the count ratio of a strain to the reference strain at time t to the count ratio of a strain to the reference strain at time 0 is dependent only on the relative fitness and the true fold-expansion of the reference strain.

Plotting the ratio of the count ratio of a strain to the reference strain at time t to the count ratio of a strain to the reference strain at time 0 should give a straight line (on a log-log) plot, with intercept 0 and gradient equal to the relative fitness minus 1.