pikhan commited on
Commit
715d4f6
·
verified ·
1 Parent(s): 8f96be1

update readme

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md CHANGED
@@ -13,4 +13,48 @@ tags:
13
  - synthetic
14
  - language model
15
  ---
 
16
  This model is an example of how a fine-tuned LLM even without the full depth, size, and complexity of larger and more expensive models can be useful in context-sensitive situations. In our use-case, we are applying this LLM as part of a broader electronic lab notebook software setup for molecular and computational biologists. This GPT-2 has been finetuned on datasets from BioASQ and PubMedQA and is now knowledgeable enough in biochemistry to assist scientists and integrates as not just a copilot-like tool but also as a lab partner to the overall Design-Built-Test-Learn workflow ever growing in prominence in synthetic biology.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - synthetic
14
  - language model
15
  ---
16
+ # Description:
17
  This model is an example of how a fine-tuned LLM even without the full depth, size, and complexity of larger and more expensive models can be useful in context-sensitive situations. In our use-case, we are applying this LLM as part of a broader electronic lab notebook software setup for molecular and computational biologists. This GPT-2 has been finetuned on datasets from BioASQ and PubMedQA and is now knowledgeable enough in biochemistry to assist scientists and integrates as not just a copilot-like tool but also as a lab partner to the overall Design-Built-Test-Learn workflow ever growing in prominence in synthetic biology.
18
+
19
+ # Intel Optimization Inference Code Sample:
20
+ We made use of both the BF16 datatype and INT8 quantization to improve performance. BF16 halves the memory compared to FP32, allowing larger models and/or larger batches to fit into memory. Moreover, BF16 is supported by modern Intel CPUs and operations with it are optimized. Quantizing models to INT8 can reduce the model size, making better use of cache and speeding up load times.
21
+ Additionally, we then optimized further with OpenVino to make it run better on Intel Hardware by converting it to an onxx model to then OpenVINO Intermediate Representation
22
+
23
+ ```
24
+ from openvino.runtime import Core
25
+ import numpy as np
26
+
27
+ # Initialize the OpenVINO runtime Core
28
+ ie = Core()
29
+
30
+ # Load and compile the model for the CPU device
31
+ compiled_model = ie.compile_model(model='../ovc_output/converted_model.xml', device_name="CPU")
32
+
33
+ # Prepare input: a non tokenized example just for examples sake
34
+ input_ids = np.random.randint(0, 50256, (1, 10))
35
+
36
+ # Create a dictionary for the inputs expected by the model
37
+ inputs = {"input_ids": input_ids}
38
+
39
+ # Create an infer request and start synchronous inference
40
+ result = compiled_model.create_infer_request().infer(inputs=inputs)
41
+
42
+ # Access output tensor data directly from the result using the appropriate output key
43
+ output = result['outputs']
44
+
45
+ print("Inference results:", output)
46
+
47
+ ```
48
+ In the finetuning file you will see our other optimizations.
49
+
50
+ We perform BFS16 conversion as follows (we also implement a custom collator):
51
+ ```
52
+ model = GPT2LMHeadModel.from_pretrained('gpt2-medium').to(torch.bfloat16)
53
+ ```
54
+
55
+ We perform Int8 quantization as follows:
56
+ ```
57
+ # Load the full-precision model
58
+ model.eval() # Ensure the model is in evaluation mode
59
+ quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
60
+ ```