stefan-insilico commited on
Commit
431933a
·
verified ·
1 Parent(s): c6d52af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -77
README.md CHANGED
@@ -2,21 +2,22 @@
2
  license: cc-by-nc-4.0
3
  ---
4
 
5
- # Precious3-GPT-Multi-Modal inference
6
 
7
- Model inference is running at HuggingFace Inference endpoint
8
 
 
 
 
 
 
9
 
10
- ## Definitions
11
 
12
- - **Signature**: up- and down-gene lists
13
 
14
- ---
15
-
16
- ## Run generation step by step
17
 
18
 
19
- ### Step 1 - connect to endpoint
20
  ```python
21
 
22
  import requests
@@ -34,15 +35,15 @@ def query(payload):
34
 
35
  ```
36
 
37
- ### Step 2 - create input for endpoint
38
 
39
  ```python
40
  import json
41
  with open('./generation-configs/meta2diff.json', 'r') as f:
42
  config_data = json.load(f)
43
 
44
- # prepare sample
45
- config_sample = {"inputs": config_data, "mode": "meta2diff", "parameters": {
46
  "temperature": 0.8,
47
  "top_p": 0.2,
48
  "top_k": 3550,
@@ -52,31 +53,18 @@ config_sample = {"inputs": config_data, "mode": "meta2diff", "parameters": {
52
 
53
  ```
54
 
55
- ### Expected input at Step 2.
56
- ```json
57
- {
58
- "inputs": {
59
- "instruction": "disease2diff2disease",
60
- "tissue": ["whole blood"],
61
- "age": 60,
62
- "cell": "",
63
- "efo": "Orphanet_139399",
64
- "datatype": "", "drug": "", "dose": "", "time": "", "case": "", "control": "", "dataset_type": "expression ", "gender": "m", "species": "human", "up": [], "down": []
65
- },
66
- "mode": "meta2diff",
67
- "parameters": {
68
- "temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137
69
- }
70
- }
71
  ```
72
 
73
- ### Step 3. Send request to endpoint
74
  ```python
75
- output = query(config_sample)
76
  ```
77
 
78
 
79
- OUTPUT STRUCTURE
80
  ```json
81
  {
82
  "output": {
@@ -93,11 +81,49 @@ NOTE: If the ```mode``` was supposed to generate compounds, the output would con
93
 
94
  ---
95
 
96
- ## Generation Modes (`mode` in config)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
  Choose the appropriate mode based on your requirements:
99
 
100
- 1. **meta2diff**: Generate signature given meta-data such as tissue, compound, gender, etc.
101
  2. **diff2compound**: Predict compounds based on signature.
102
  3. **meta2diff2compound**: Generate signatures given meta-data and then predict compounds based on generated signatures.
103
 
@@ -106,10 +132,8 @@ Choose the appropriate mode based on your requirements:
106
 
107
  ### Instruction (`inputs.instruction` in config)
108
 
109
- You can use the following instructions (one or several at a time):
110
-
111
- 1. disease2diff2disease - generate signature for disease
112
- 2. compound2diff2compound - generate signature for compound
113
  3. age_group2diff2age_group - generate signature for age group / predict age group based on signature
114
 
115
 
@@ -123,49 +147,51 @@ Full list of available values for each meta-data item you can find in ```p3_enti
123
 
124
  ## Examples
125
 
126
- In the following example all possible configuration fields are specified. You can leave some meta-data fields in ```inputs``` section empty string(```""```) or empty list(```[]```).
127
 
128
- _Example 1_
129
 
130
- If you want to generate signature given specific meta-data you can use the following configuration. Note, ```up``` and ```down``` fields are empty lists as you want to generate them.
 
131
 
132
  ```json
133
  {
134
  "inputs": {
135
- "instruction": "disease2diff2disease",
136
- "tissue": ["whole blood"],
137
  "age": "",
138
- "cell": "",
139
- "efo": "Orphanet_139399",
140
- "datatype": "",
141
- "drug": "",
142
- "dose": "",
143
- "time": "",
144
- "case": "",
145
- "control": "",
146
- "dataset_type": "expression",
147
- "gender": "m",
148
- "species": "human",
149
- "up": [],
150
- "down": []
151
- },
152
- "mode": "meta2diff",
153
  "parameters": {
154
- "temperature": 0.8,
155
- "top_p": 0.2,
156
- "top_k": 3550,
157
- "n_next_tokens": 50,
158
- "random_seed": 137
159
  }
160
  }
 
161
 
 
 
 
 
 
 
 
 
 
 
 
 
162
  ```
163
- Here we asked model to generate signature for Human, male, in tissue - whole blood with disease Orphanet_139399.
164
 
165
 
166
- _Example 2_
 
 
 
 
167
 
168
- You want to generate signature for healthy Human, male, 40-50 years, in tissue - whole blood.
169
  ```json
170
  {
171
  "inputs": {
@@ -174,16 +200,7 @@ You want to generate signature for healthy Human, male, 40-50 years, in tissue -
174
  "age": "",
175
  "cell": "",
176
  "efo": "",
177
- "datatype": "",
178
- "drug": "",
179
- "dose": "",
180
- "time": "",
181
- "case": "40.0-50.0",
182
- "control": "",
183
- "dataset_type": "expression",
184
- "gender": "m",
185
- "species": "human",
186
- "up": [],
187
  "down": []
188
  },
189
  "mode": "meta2diff",
@@ -197,10 +214,22 @@ You want to generate signature for healthy Human, male, 40-50 years, in tissue -
197
  }
198
 
199
  ```
200
- Note, here we used ```disease2diff2disease``` instruction, but we expected to generate signatures for healthy human, that's why we'd set ```efo``` to empty string "".
201
- Alternatively, we can add one more instruction to example 2 - ```"instruction": ["disease2diff2disease", "age_group2diff2age_group"]```
 
 
 
 
 
 
 
 
 
 
 
 
202
 
203
  ---
204
 
205
  ## Multi-Modality
206
- Applies by default in tasks where you pass signature. For each gene in up- and down- lists model gets embeddings from Knowledge Graph and Text NNs. Then embeddings are averaged in order to obtain one embedding for each modality for each gene list (4 averaged embeddings in total).
 
2
  license: cc-by-nc-4.0
3
  ---
4
 
5
+ ## Precious3-GPT-Multi-Modal
6
 
7
+ A multi-modal multi-omics multi-species language model.
8
 
9
+ - **Developer**: [Insilico Medicine](https://insilico.com/precious)
10
+ - **License**: cc-by-nc-4.0
11
+ - **Model size**: 89.4 million parameters
12
+ - **Domain**: Biomedical
13
+ - **Base architecture**: [MPT](https://huggingface.co/mosaicml/mpt-7b)
14
 
 
15
 
 
16
 
17
+ ### Run model using endpoint step by step
 
 
18
 
19
 
20
+ **Step 1 - connect to endpoint**
21
  ```python
22
 
23
  import requests
 
35
 
36
  ```
37
 
38
+ **Step 2 - create input for endpoint**
39
 
40
  ```python
41
  import json
42
  with open('./generation-configs/meta2diff.json', 'r') as f:
43
  config_data = json.load(f)
44
 
45
+ # prepare request configuration
46
+ request_config = {"inputs": config_data, "mode": "meta2diff", "parameters": {
47
  "temperature": 0.8,
48
  "top_p": 0.2,
49
  "top_k": 3550,
 
53
 
54
  ```
55
 
56
+ **How Precisou3-GPT will see given request**
57
+ ```text
58
+ [BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><age_individ></age_individ><cell></cell><efo>EFO_0000768 </efo><datatype>expression </datatype><drug>curcumin </drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type></dataset_type><gender>m </gender><species>human </species>
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ```
60
 
61
+ **Step 3 - send request to endpoint**
62
  ```python
63
+ output = query(request_config)
64
  ```
65
 
66
 
67
+ **Enpoint output structure**
68
  ```json
69
  {
70
  "output": {
 
81
 
82
  ---
83
 
84
+ ### Run model locally
85
+ <details>
86
+ <summary style="font-weight:600">Details</summary>
87
+
88
+
89
+
90
+ 1. Download the repository https://huggingface.co/insilicomedicine/precious3-gpt-multi-modal
91
+
92
+ 2. Inside the repository execute:
93
+ ```python
94
+
95
+ # init handler
96
+ from handler import EndpointHandler
97
+ precious3gpt_handler = EndpointHandler(path='./')
98
+
99
+ import json
100
+ with open('./generation-configs/meta2diff.json', 'r') as f:
101
+ config_data = json.load(f)
102
+
103
+ # prepare request configuration
104
+ request_config = {"inputs": config_data,
105
+ "mode": "meta2diff",
106
+ "parameters": {
107
+ "temperature": 0.8,
108
+ "top_p": 0.2,
109
+ "top_k": 3550,
110
+ "n_next_tokens": 50,
111
+ "random_seed": 137
112
+ }}
113
+
114
+ output = precious3gpt_handler(request_config)
115
+
116
+ ```
117
+ </details>
118
+
119
+ ---
120
+ ## Precious3-GPT request configuration
121
+
122
+ ### Generation Modes (`mode` in config)
123
 
124
  Choose the appropriate mode based on your requirements:
125
 
126
+ 1. **meta2diff**: Generate signature (up- and down- gene lists) given meta-data such as tissue, compound, gender, etc.
127
  2. **diff2compound**: Predict compounds based on signature.
128
  3. **meta2diff2compound**: Generate signatures given meta-data and then predict compounds based on generated signatures.
129
 
 
132
 
133
  ### Instruction (`inputs.instruction` in config)
134
 
135
+ 1. disease2diff2disease - generate signature for disease / predict disease based on given signature
136
+ 2. compound2diff2compound - generate signature for compound / predict compound based on given signature
 
 
137
  3. age_group2diff2age_group - generate signature for age group / predict age group based on signature
138
 
139
 
 
147
 
148
  ## Examples
149
 
150
+ In the following examples all possible configuration fields are specified. You can leave some meta-data fields in the ```inputs``` section empty string(```""```) or empty list(```[]```).
151
 
152
+ _**Example 1**_
153
 
154
+ If you want to generate a signature given specific meta-data you can use the following configuration. Note, ```up``` and ```down``` fields are empty lists as you want to generate them.
155
+ Here we ask the model to generate a signature for a human within the age group of 70-90 years, male, in tissue - Lungs with disease EFO_0000768.
156
 
157
  ```json
158
  {
159
  "inputs": {
160
+ "instruction": ["age_group2diff2age_group", "disease2diff2disease", "compound2diff2compound"],
161
+ "tissue": ["lung"],
162
  "age": "",
163
+ "cell": "",
164
+ "efo": "EFO_0000768",
165
+ "datatype": "", "drug": "", "dose": "", "time": "", "case": ["70.0-80.0", "80.0-90.0"], "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], "down": []
166
+ },
167
+ "mode": "meta2diff",
 
 
 
 
 
 
 
 
 
 
168
  "parameters": {
169
+ "temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137
 
 
 
 
170
  }
171
  }
172
+ ```
173
 
174
+ Here is output:
175
+ ```json
176
+ {
177
+ "output": {
178
+ "up": [["PTGDR2", "CABYR", "MGAM", "TMED9", "SHOX2", "MAT1A", "MUC5AC", "GASK1B", "CYP1A2", "RP11-266K4.9", ...]], // generated list of up-regulated genes
179
+ "down": [["MB", "OR10V1", "OR51H1", "GOLGA6L10", "OR6M1", "CDX4", "OR4C45", "SPRR2A", "SPDYE9", "GBX2", "ATP4B", ...]] // generated list of down-regulated genes
180
+ },
181
+ "mode": "meta2diff", // generation mode we specified
182
+ "message": "Done!",
183
+ "input": "[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><cell></cell><efo>EFO_0000768 </efo><datatype></datatype><drug></drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>", // actual input prompt for the model
184
+ "random_seed": 137
185
+ }
186
  ```
 
187
 
188
 
189
+ _**Example 2**_
190
+
191
+ Now let's generate a signature for a healthy human within the age group of 70-90 years, male, in tissue - whole blood.
192
+ Note, here we use ```disease2diff2disease``` instruction, but we expect to generate signatures for a healthy human, that's why we'd set ```efo``` to empty string "".
193
+ Alternatively, for this example we can add one more instruction to example 2 - "instruction": ["disease2diff2disease", "age_group2diff2age_group"]
194
 
 
195
  ```json
196
  {
197
  "inputs": {
 
200
  "age": "",
201
  "cell": "",
202
  "efo": "",
203
+ "datatype": "", "drug": "", "dose": "", "time": "", "case": "40.0-50.0", "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [],
 
 
 
 
 
 
 
 
 
204
  "down": []
205
  },
206
  "mode": "meta2diff",
 
214
  }
215
 
216
  ```
217
+
218
+ Here is output:
219
+ ```json
220
+ {
221
+ "output": {
222
+ "up": [["IER3", "APOC2", "EDNRB", "JAKMIP2", "BACE2", ... ]],
223
+ "down": [["TBL1Y", "TDP1", "PLPP4", "CPEB1", "ITPR3", ... ]]
224
+ },
225
+ "mode": "meta2diff",
226
+ "message": "Done!",
227
+ "input": "[BOS]<disease2diff2disease><age_group2diff2age_group><tissue>whole blood </tissue><cell></cell><efo></efo><datatype></datatype><drug></drug><dose></dose><time></time><case>40.0-50.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>",
228
+ "random_seed": 137
229
+ }
230
+ ```
231
 
232
  ---
233
 
234
  ## Multi-Modality
235
+ Applies by default in tasks where you pass a signature. For each gene in up- and down- lists the model gets embeddings from Knowledge Graph and Text NNs. Then embeddings are averaged in order to obtain one embedding for each modality for each gene list (4 averaged embeddings in total).