pikhan
/

gpt2-medium-biochem-bioasq-pubmedqa-demo

@@ -13,4 +13,48 @@ tags:
 - synthetic
 - language model
 ---
 This model is an example of how a fine-tuned LLM even without the full depth, size, and complexity of larger and more expensive models can be useful in context-sensitive situations. In our use-case, we are applying this LLM as part of a broader electronic lab notebook software setup for molecular and computational biologists. This GPT-2 has been finetuned on datasets from BioASQ and PubMedQA and is now knowledgeable enough in biochemistry to assist scientists and integrates as not just a copilot-like tool but also as a lab partner to the overall Design-Built-Test-Learn workflow ever growing in prominence in synthetic biology.

 - synthetic
 - language model
 ---
+# Description:
 This model is an example of how a fine-tuned LLM even without the full depth, size, and complexity of larger and more expensive models can be useful in context-sensitive situations. In our use-case, we are applying this LLM as part of a broader electronic lab notebook software setup for molecular and computational biologists. This GPT-2 has been finetuned on datasets from BioASQ and PubMedQA and is now knowledgeable enough in biochemistry to assist scientists and integrates as not just a copilot-like tool but also as a lab partner to the overall Design-Built-Test-Learn workflow ever growing in prominence in synthetic biology.
+# Intel Optimization Inference Code Sample:
+We made use of both the BF16 datatype and INT8 quantization to improve performance. BF16 halves the memory compared to FP32, allowing larger models and/or larger batches to fit into memory. Moreover, BF16 is supported by modern Intel CPUs and operations with it are optimized. Quantizing models to INT8 can reduce the model size, making better use of cache and speeding up load times.
+Additionally, we then optimized further with OpenVino to make it run better on Intel Hardware by converting it to an onxx model to then OpenVINO Intermediate Representation
+```
+from openvino.runtime import Core
+import numpy as np
+# Initialize the OpenVINO runtime Core
+ie = Core()
+# Load and compile the model for the CPU device
+compiled_model = ie.compile_model(model='../ovc_output/converted_model.xml', device_name="CPU")
+# Prepare input: a non tokenized example just for examples sake
+input_ids = np.random.randint(0, 50256, (1, 10))
+# Create a dictionary for the inputs expected by the model
+inputs = {"input_ids": input_ids}
+# Create an infer request and start synchronous inference
+result = compiled_model.create_infer_request().infer(inputs=inputs)
+# Access output tensor data directly from the result using the appropriate output key
+output = result['outputs']
+print("Inference results:", output)
+```
+In the finetuning file you will see our other optimizations.
+We perform BFS16 conversion as follows (we also implement a custom collator):
+```
+model = GPT2LMHeadModel.from_pretrained('gpt2-medium').to(torch.bfloat16)
+```
+We perform Int8 quantization as follows:
+```
+# Load the full-precision model
+model.eval()  # Ensure the model is in evaluation mode
+quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
+```