Qichen Huang commited on
Commit
a214b85
·
1 Parent(s): 75e1f3d

Add model checkpoint and README.md

Browse files
README.md CHANGED
@@ -4,4 +4,49 @@ datasets:
4
  - tahoebio/Tahoe-100M
5
  tags:
6
  - tahoe-deepdive
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - tahoebio/Tahoe-100M
5
  tags:
6
  - tahoe-deepdive
7
+ ---
8
+
9
+
10
+
11
+ ## Team Name
12
+ Tahoeformer
13
+
14
+ ## Members
15
+ - Ryan Keivanfar [GitHub: rylosqualo](https://github.com/rylosqualo)
16
+ - Min Dai [GitHub: genecell](https://github.com/genecell)
17
+ - Xinyu Yuan [GitHub: KatarinaYuan](https://github.com/KatarinaYuan)
18
+ - Qichen Huang [GitHub: qhuang20](https://github.com/qhuang20)
19
+
20
+
21
+ ## Project
22
+
23
+ ### Title
24
+ Tahoeformer: Interpreting Cellular Context and DNA Sequence Determinants Underlying Drug Response
25
+
26
+ ### Overview
27
+ Tahoeformer is a deep learning model that integrates cellular context and DNA sequence information to predict drug responses. Built upon the Enformer architecture, our model aims to understand how genome variations influence drug effects in different cellular environments.
28
+
29
+ ### Motivation
30
+ Precision medicine requires understanding how genetic variations affect drug responses across different cellular contexts. Tahoeformer addresses this challenge by modeling:
31
+ - Cellular context (different transcriptional factor expression patterns)
32
+ - DNA sequence variations (transcriptional factor binding site mutations)
33
+
34
+ ### Methods
35
+ We fine-tuned the Enformer architecture using the Tahoe-100M dataset, incorporating:
36
+ - Morgan fingerprints for drug representation
37
+ - Pseudobulked gene expression data across 8 cell lines with 27 drugs at a single dosage
38
+ - DNA sequence information centered around TSS (transcription start sites) from a curated subset of 500 genes
39
+
40
+ ### Results
41
+ Our model demonstrates strong performance on top 20 curated genes in predicting gene expression changes in response to drug treatments across different cellular contexts, enabling better understanding of drug-genome interactions.
42
+
43
+ ## Datasets
44
+ - [Tahoe-100M dataset](https://huggingface.co/datasets/tahoebio/Tahoe-100M)
45
+
46
+ ## Acknowledgements
47
+ - [Enformer](https://www.nature.com/articles/s41586-021-03412-8)
48
+ - [GradientShap](https://captum.ai/api/gradient_shap.html#)
49
+ - [Weights & Biases](https://wandb.ai/)
50
+
51
+
52
+
epoch=94-validation_pearson_epoch=0.2972.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12b95b21d2681643edeeb13bcf755956bd970b944d6828d091d599e77311ec98
3
+ size 2850655814