nsbendre25 commited on
Commit
3a1890f
·
verified ·
1 Parent(s): 6d23399

Updated README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -30
README.md CHANGED
@@ -1,53 +1,60 @@
1
  ---
2
  language:
3
- - en
4
  pipeline_tag: text-generation
5
  tags:
6
- - OpenVINO
7
- - meta
8
- - llama
9
- - llama-3
10
- - PyTorch
11
  license: llama3
12
  extra_gated_prompt: |
13
-
14
- Meta Llama 3 Version Release Date: April 18, 2024
15
  library_name: transformers
16
  ---
 
17
  # Llama-3-8B-Instruct-ov-fp16-int4-sym
18
-
19
- ## Built with Meta Llama 3
20
-
21
  ## Model Description
22
- This model is an OpenVINO IR FP16-INT4-ASYM optimized version of the original [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), tailored for efficient inference while maintaining high performance. This version is optimized using [OpenVINO](https://github.com/openvinotoolkit/openvino), enhancing deployment on Intel hardware architectures. The model was optimized by following guidlines from [OpenVINO Notebooks](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks)
23
-
 
24
  ## Intended Use
25
  This model is designed for advanced natural language understanding and generation tasks, ideal for academic researchers and developers in commercial settings looking to integrate efficient AI capabilities into their applications. It is not to be used for creating or promoting harmful or illegal content as per the guidelines outlined in the [Meta Llama 3 Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
26
-
27
- ## Model Architecture
28
- Like the original model,
29
- Llama-3-8B-Instruct-ov-fp16-int4-sym is based on an auto-regressive transformer architecture, fine-tuned with a focus on instruction-based tasks. The int8 quantization ensures it runs efficiently on compatible hardware without significant loss in performance.
30
-
31
- ## Carbon Footprint and Sustainability
32
- Our model training processes are committed to sustainability. The original training utilized Meta’s Research SuperCluster, significantly offsetting carbon emissions to ensure environmentally responsible AI development.
33
-
34
  ## Licensing and Redistribution
35
- This model is released under the Meta Llama 3 Community License. Redistribution requires inclusion of this license and a citation to the original model. Modifications and derivative works must prominently display "Built with Meta Llama 3" and adhere to the redistribution policies detailed in the original [license terms](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).
36
-
37
- ## How to Use
38
- This model is optimized for inference using Intel's Optimum library with OpenVINO 2024.1, which enables enhanced performance on Intel hardware. Below are the steps to set up and run the model using Optimum and OpenVINO 2024.1 :
39
-
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ```python
41
  from optimum.intel.openvino import OVModelForCausalLM
42
  from transformers import AutoTokenizer
43
-
44
-
45
  model_id = "nsbendre25/Llama-3-8B-Instruct-ov_fp16-int4_sym"
46
-
47
  # Initialize the tokenizer and model
48
  tokenizer = AutoTokenizer.from_pretrained(model_id)
49
  model = OVModelForCausalLM.from_pretrained(model_id)
50
-
51
  pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
52
  pipeline("Hey how are you doing today?")
53
  ```
 
1
  ---
2
  language:
3
+ - en
4
  pipeline_tag: text-generation
5
  tags:
6
+ - OpenVINO
7
+ - meta
8
+ - llama
9
+ - llama-3
10
+ - PyTorch
11
  license: llama3
12
  extra_gated_prompt: |
13
+
14
+ Meta Llama 3 Version Release Date: April 18, 2024
15
  library_name: transformers
16
  ---
17
+
18
  # Llama-3-8B-Instruct-ov-fp16-int4-sym
19
+
20
+ ## Built with Meta Llama 3
21
+
22
  ## Model Description
23
+
24
+ This is a version of the original [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model converted to [OpenVINO™](https://github.com/openvinotoolkit/openvino) IR (Intermediate Representation) format for optimized inference on Intel® hardware. The model is created using the examples shown in [OpenVINO™ Notebooks](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks) repository.
25
+
26
  ## Intended Use
27
  This model is designed for advanced natural language understanding and generation tasks, ideal for academic researchers and developers in commercial settings looking to integrate efficient AI capabilities into their applications. It is not to be used for creating or promoting harmful or illegal content as per the guidelines outlined in the [Meta Llama 3 Acceptable Use Policy](https://llama.meta.com/llama3/use-policy/).
28
+
 
 
 
 
 
 
 
29
  ## Licensing and Redistribution
30
+ This model is released under the Meta Llama 3 Community License. Redistribution requires inclusion of this license and a citation to the original model. Modifications and derivative works must prominently display "Built with Meta Llama 3" and adhere to the redistribution policies detailed in the original model [license terms](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/LICENSE).
31
+
32
+ ## Weight Compression Parameters
33
+ For more information on the parameters, refer to the [OpenVINO 2024.1.0 documentation](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html)
34
+
35
+ * mode: **INT4_SYM**
36
+ * group_size: **128**
37
+ * ratio: **0.8**
38
+
39
+ ## Running Model Inference
40
+
41
+ Install packages required for using [Optimum Intel](https://huggingface.co/docs/optimum/intel/index) integration with the OpenVINO™ backend:
42
+
43
+ ```sh
44
+ pip install --upgrade --upgrade-strategy eager "optimum[openvino]"
45
+ ```
46
+
47
+ Run model inference:
48
  ```python
49
  from optimum.intel.openvino import OVModelForCausalLM
50
  from transformers import AutoTokenizer
51
+
 
52
  model_id = "nsbendre25/Llama-3-8B-Instruct-ov_fp16-int4_sym"
53
+
54
  # Initialize the tokenizer and model
55
  tokenizer = AutoTokenizer.from_pretrained(model_id)
56
  model = OVModelForCausalLM.from_pretrained(model_id)
57
+
58
  pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")
59
  pipeline("Hey how are you doing today?")
60
  ```