insilicomedicine
/

precious3-gpt-multi-modal

@@ -2,21 +2,22 @@
 license: cc-by-nc-4.0
 ---
-# Precious3-GPT-Multi-Modal inference
-Model inference is running at HuggingFace Inference endpoint
-## Definitions
-- **Signature**: up- and down-gene lists
----
-## Run generation step by step
-### Step 1 - connect to endpoint
 ```python
 import requests
@@ -34,15 +35,15 @@ def query(payload):
 ```
-### Step 2 - create input for endpoint
 ```python
 import json
 with open('./generation-configs/meta2diff.json', 'r') as f:
     config_data = json.load(f)
-# prepare sample
-config_sample = {"inputs": config_data, "mode": "meta2diff", "parameters": {
     "temperature": 0.8,
     "top_p": 0.2,
     "top_k": 3550,
@@ -52,31 +53,18 @@ config_sample = {"inputs": config_data, "mode": "meta2diff", "parameters": {
 ```
-### Expected input at Step 2.
-```json
-{
-    "inputs": {
-        "instruction": "disease2diff2disease",
-        "tissue": ["whole blood"],
-        "age": 60,
-        "cell": "",
-        "efo": "Orphanet_139399",
-        "datatype": "", "drug": "", "dose": "", "time": "", "case": "", "control": "", "dataset_type": "expression ", "gender": "m", "species": "human", "up": [], "down": []
-    },
-    "mode": "meta2diff",
-    "parameters": {
-        "temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137
-    }
-}
 ```
-### Step 3. Send request to endpoint
 ```python
-output = query(config_sample)
 ```
-OUTPUT STRUCTURE
 ```json
 {
     "output": {
@@ -93,11 +81,49 @@ NOTE: If the ```mode``` was supposed to generate compounds, the output would con
 ---
-## Generation Modes (`mode` in config)
 Choose the appropriate mode based on your requirements:
-1. **meta2diff**: Generate signature given meta-data such as tissue, compound, gender, etc.
 2. **diff2compound**: Predict compounds based on signature.
 3. **meta2diff2compound**: Generate signatures given meta-data and then predict compounds based on generated signatures.
@@ -106,10 +132,8 @@ Choose the appropriate mode based on your requirements:
 ### Instruction (`inputs.instruction` in config)
-You can use the following instructions (one or several at a time):
-1. disease2diff2disease - generate signature for disease
-2. compound2diff2compound - generate signature for compound
 3. age_group2diff2age_group - generate signature for age group / predict age group based on signature
@@ -123,49 +147,51 @@ Full list of available values for each meta-data item you can find in ```p3_enti
 ## Examples
-In the following example all possible configuration fields are specified. You can leave some meta-data fields in ```inputs``` section empty string(```""```) or empty list(```[]```).
-_Example 1_
-If you want to generate signature given specific meta-data you can use the following configuration. Note, ```up``` and ```down``` fields are empty lists as you want to generate them.
 ```json
 {
     "inputs": {
-        "instruction": "disease2diff2disease",
-        "tissue": ["whole blood"],
         "age": "",
-        "cell": "",
-        "efo": "Orphanet_139399",
-        "datatype": "",
-        "drug": "",
-        "dose": "",
-        "time": "",
-        "case": "",
-        "control": "",
-        "dataset_type": "expression",
-        "gender": "m",
-        "species": "human",
-        "up": [],
-        "down": []
-    },
-    "mode": "meta2diff",
     "parameters": {
-        "temperature": 0.8,
-        "top_p": 0.2,
-        "top_k": 3550,
-        "n_next_tokens": 50,
-        "random_seed": 137
     }
 }
 ```
-Here we asked model to generate signature for Human, male, in tissue - whole blood with disease Orphanet_139399.
-_Example 2_
-You want to generate signature for healthy Human, male, 40-50 years, in tissue - whole blood.
 ```json
 {
     "inputs": {
@@ -174,16 +200,7 @@ You want to generate signature for healthy Human, male, 40-50 years, in tissue -
         "age": "",
         "cell": "",
         "efo": "",
-        "datatype": "",
-        "drug": "",
-        "dose": "",
-        "time": "",
-        "case": "40.0-50.0",
-        "control": "",
-        "dataset_type": "expression",
-        "gender": "m",
-        "species": "human",
-        "up": [],
         "down": []
     },
     "mode": "meta2diff",
@@ -197,10 +214,22 @@ You want to generate signature for healthy Human, male, 40-50 years, in tissue -
 }
 ```
-Note, here we used ```disease2diff2disease``` instruction, but we expected to generate signatures for healthy human, that's why we'd set ```efo``` to empty string "".
-Alternatively, we can add one more instruction to example 2 - ```"instruction": ["disease2diff2disease", "age_group2diff2age_group"]```
 ---
 ## Multi-Modality
-Applies by default in tasks where you pass signature. For each gene in up- and down- lists model gets embeddings from Knowledge Graph and Text NNs. Then embeddings are averaged in order to obtain one embedding for each modality for each gene list (4 averaged embeddings in total).

 license: cc-by-nc-4.0
 ---
+## Precious3-GPT-Multi-Modal
+A multi-modal multi-omics multi-species language model.
+- **Developer**: [Insilico Medicine](https://insilico.com/precious)
+- **License**: cc-by-nc-4.0
+- **Model size**: 89.4 million parameters
+- **Domain**: Biomedical
+- **Base architecture**: [MPT](https://huggingface.co/mosaicml/mpt-7b)
+### Run model using endpoint step by step
+**Step 1 - connect to endpoint**
 ```python
 import requests
 ```
+**Step 2 - create input for endpoint**
 ```python
 import json
 with open('./generation-configs/meta2diff.json', 'r') as f:
     config_data = json.load(f)
+# prepare request configuration
+request_config = {"inputs": config_data, "mode": "meta2diff", "parameters": {
     "temperature": 0.8,
     "top_p": 0.2,
     "top_k": 3550,
 ```
+**How Precisou3-GPT will see given request**
+```text
+[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><age_individ></age_individ><cell></cell><efo>EFO_0000768 </efo><datatype>expression </datatype><drug>curcumin </drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type></dataset_type><gender>m </gender><species>human </species>
 ```
+**Step 3 - send request to endpoint**
 ```python
+output = query(request_config)
 ```
+**Enpoint output structure**
 ```json
 {
     "output": {
 ---
+### Run model locally
+<details>
+  <summary style="font-weight:600">Details</summary>
+ 1. Download the repository https://huggingface.co/insilicomedicine/precious3-gpt-multi-modal
+ 2. Inside the repository execute:
+```python
+# init handler
+from handler import EndpointHandler
+precious3gpt_handler = EndpointHandler(path='./')
+import json
+with open('./generation-configs/meta2diff.json', 'r') as f:
+    config_data = json.load(f)
+# prepare request configuration
+request_config = {"inputs": config_data,
+                  "mode": "meta2diff",
+                  "parameters": {
+    "temperature": 0.8,
+    "top_p": 0.2,
+    "top_k": 3550,
+    "n_next_tokens": 50,
+    "random_seed": 137
+}}
+output = precious3gpt_handler(request_config)
+```
+</details>
+---
+## Precious3-GPT request configuration
+### Generation Modes (`mode` in config)
 Choose the appropriate mode based on your requirements:
+1. **meta2diff**: Generate signature (up- and down- gene lists) given meta-data such as tissue, compound, gender, etc.
 2. **diff2compound**: Predict compounds based on signature.
 3. **meta2diff2compound**: Generate signatures given meta-data and then predict compounds based on generated signatures.
 ### Instruction (`inputs.instruction` in config)
+1. disease2diff2disease - generate signature for disease / predict disease based on given signature
+2. compound2diff2compound - generate signature for compound / predict compound based on given signature
 3. age_group2diff2age_group - generate signature for age group / predict age group based on signature
 ## Examples
+In the following examples all possible configuration fields are specified. You can leave some meta-data fields in the ```inputs``` section empty string(```""```) or empty list(```[]```).
+_**Example 1**_
+If you want to generate a signature given specific meta-data you can use the following configuration. Note, ```up``` and ```down``` fields are empty lists as you want to generate them.
+Here we ask the model to generate a signature for a human within the age group of 70-90 years, male, in tissue - Lungs with disease EFO_0000768.
 ```json
 {
     "inputs": {
+        "instruction": ["age_group2diff2age_group", "disease2diff2disease", "compound2diff2compound"],
+        "tissue": ["lung"],
         "age": "",
+        "cell": "",
+        "efo": "EFO_0000768",
+        "datatype": "", "drug": "", "dose": "", "time": "", "case": ["70.0-80.0", "80.0-90.0"], "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], "down": []
+    },
+    "mode": "meta2diff",
     "parameters": {
+        "temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137
     }
 }
+```
+Here is output:
+```json
+{
+  "output": {
+    "up": [["PTGDR2", "CABYR", "MGAM", "TMED9", "SHOX2", "MAT1A", "MUC5AC", "GASK1B", "CYP1A2", "RP11-266K4.9", ...]], // generated list of up-regulated genes
+    "down": [["MB", "OR10V1", "OR51H1", "GOLGA6L10", "OR6M1", "CDX4", "OR4C45", "SPRR2A", "SPDYE9", "GBX2", "ATP4B", ...]] // generated list of down-regulated genes
+  },
+  "mode": "meta2diff", // generation mode we specified
+  "message": "Done!",
+  "input": "[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><cell></cell><efo>EFO_0000768 </efo><datatype></datatype><drug></drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>", // actual input prompt for the model
+  "random_seed": 137
+}
 ```
+_**Example 2**_
+Now let's generate a signature for a healthy human within the age group of 70-90 years, male, in tissue - whole blood.
+Note, here we use ```disease2diff2disease``` instruction, but we expect to generate signatures for a healthy human, that's why we'd set ```efo``` to empty string "".
+Alternatively, for this example we can add one more instruction to example 2 - "instruction": ["disease2diff2disease", "age_group2diff2age_group"]
 ```json
 {
     "inputs": {
         "age": "",
         "cell": "",
         "efo": "",
+        "datatype": "", "drug": "", "dose": "", "time": "", "case": "40.0-50.0", "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [],
         "down": []
     },
     "mode": "meta2diff",
 }
 ```
+Here is output:
+```json
+{
+  "output": {
+    "up": [["IER3", "APOC2", "EDNRB", "JAKMIP2", "BACE2", ... ]],
+    "down": [["TBL1Y", "TDP1", "PLPP4", "CPEB1", "ITPR3", ... ]]
+  },
+  "mode": "meta2diff",
+  "message": "Done!",
+  "input": "[BOS]<disease2diff2disease><age_group2diff2age_group><tissue>whole blood </tissue><cell></cell><efo></efo><datatype></datatype><drug></drug><dose></dose><time></time><case>40.0-50.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>",
+  "random_seed": 137
+}
+```
 ---
 ## Multi-Modality
+Applies by default in tasks where you pass a signature. For each gene in up- and down- lists the model gets embeddings from Knowledge Graph and Text NNs. Then embeddings are averaged in order to obtain one embedding for each modality for each gene list (4 averaged embeddings in total).